Merge lp:~jon-hill/supertree-toolkit/sub_in_subfile into lp:supertree-toolkit
- sub_in_subfile
- Merge into stk
Proposed by
Jon Hill
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | 281 | ||||
Proposed branch: | lp:~jon-hill/supertree-toolkit/sub_in_subfile | ||||
Merge into: | lp:supertree-toolkit | ||||
Diff against target: |
13342 lines (+11326/-781) 44 files modified
debian/control (+1/-1) debian/rules (+1/-0) notes.txt (+38/-0) stk/bzr_version.py (+5/-5) stk/p4/NexusToken.py (+1/-0) stk/p4/NexusToken2.py (+1/-1) stk/p4/Tree.py (+1/-9) stk/p4/Tree_muck.py (+4/-2) stk/scripts/check_nomenclature.py (+0/-224) stk/scripts/check_nomenclature.py.moved (+224/-0) stk/scripts/create_colours_itol.py (+2/-11) stk/scripts/create_taxonomy.py (+4/-100) stk/scripts/fill_in_with_taxonomy.py (+711/-174) stk/scripts/plot_character_taxa_matrix.py (+83/-1) stk/scripts/plot_tree_taxa_matrix.py (+56/-0) stk/scripts/remove_poorly_constrained_taxa.py (+43/-20) stk/scripts/tree_from_taxonomy.py (+142/-0) stk/stk (+787/-34) stk/stk_exceptions.py (+8/-0) stk/supertree_toolkit.py (+849/-47) stk/test/_substitute_taxa.py (+19/-1) stk/test/_supertree_toolkit.py (+138/-15) stk/test/_trees.py (+13/-1) stk/test/data/input/auto_sub.phyml (+97/-0) stk/test/data/input/check_data_ind.phyml (+141/-0) stk/test/data/input/check_taxonomy.phyml (+67/-0) stk/test/data/input/check_taxonomy_fixes.phyml (+378/-0) stk/test/data/input/create_taxonomy.csv (+6/-6) stk/test/data/input/create_taxonomy.phyml (+67/-0) stk/test/data/input/equivalents.csv (+5/-0) stk/test/data/input/mrca.tre (+1/-0) stk/test/data/input/old_stk_test_data_ind.phyml (+1324/-0) stk/test/data/input/old_stk_test_data_tax_overlap.phyml (+627/-0) stk/test/data/input/old_stk_test_nonmonophyl_removed.phyml (+1324/-0) stk/test/data/input/old_stk_test_species_level.phyml (+1324/-0) stk/test/data/input/old_stk_test_taxonomy.csv (+334/-0) stk/test/data/input/old_stk_test_taxonomy_check_subs.dat (+26/-0) stk/test/data/input/old_stk_test_taxonomy_checked.phyml (+1324/-0) stk/test/data/input/old_stk_test_taxonomy_checker.csv (+336/-0) stk/test/data/output/one_click_subs_output.phyml (+97/-0) stk/test/util.py (+7/-0) stk_gui/gui/gui.glade (+670/-124) stk_gui/plugins/phyml/name_author.py (+4/-1) stk_gui/stk_gui/interface.py (+36/-4) |
||||
To merge this branch: | bzr merge lp:~jon-hill/supertree-toolkit/sub_in_subfile | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Jon Hill | Approve | ||
Review via email: mp+314598@code.launchpad.net |
Commit message
Description of the change
Adding taxonomic awareness and fixing a lot of bugs
To post a comment you must log in.
Revision history for this message
Jon Hill (jon-hill) : | # |
review:
Approve
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'debian/control' | |||
2 | --- debian/control 2016-12-14 16:22:12 +0000 | |||
3 | +++ debian/control 2017-01-12 09:27:31 +0000 | |||
4 | @@ -9,7 +9,7 @@ | |||
5 | 9 | 9 | ||
6 | 10 | Package: supertree-toolkit | 10 | Package: supertree-toolkit |
7 | 11 | Architecture: all | 11 | Architecture: all |
9 | 12 | Depends: python-tk, python-dxdiff, python-pygraphviz, python-lxml-dbg, python-lxml, python-gtk2, python-numpy, python-matplotlib, python-lxml, libxml2-utils, python, python-gtksourceview2, python-glade2, python-networkx | 12 | Depends: python-tk, python-simplejson, python-dxdiff, python-pygraphviz, python-lxml-dbg, python-lxml, python-gtk2, python-numpy, python-matplotlib, python-lxml, libxml2-utils, python, python-gtksourceview2, python-glade2, python-networkx, python-argcomplete |
10 | 13 | Recommends: python-psyco | 13 | Recommends: python-psyco |
11 | 14 | Suggests: | 14 | Suggests: |
12 | 15 | Conflicts: | 15 | Conflicts: |
13 | 16 | 16 | ||
14 | === modified file 'debian/rules' | |||
15 | --- debian/rules 2013-10-14 12:58:59 +0000 | |||
16 | +++ debian/rules 2017-01-12 09:27:31 +0000 | |||
17 | @@ -6,5 +6,6 @@ | |||
18 | 6 | 6 | ||
19 | 7 | override_dh_auto_install: | 7 | override_dh_auto_install: |
20 | 8 | python setup.py install --root=debian/supertree-toolkit --install-layout=deb --install-scripts=/usr/bin | 8 | python setup.py install --root=debian/supertree-toolkit --install-layout=deb --install-scripts=/usr/bin |
21 | 9 | argcomplete.autocomplete(parser) | ||
22 | 9 | 10 | ||
23 | 10 | override_dh_auto_build: | 11 | override_dh_auto_build: |
24 | 11 | 12 | ||
25 | === added file 'notes.txt' | |||
26 | --- notes.txt 1970-01-01 00:00:00 +0000 | |||
27 | +++ notes.txt 2017-01-12 09:27:31 +0000 | |||
28 | @@ -0,0 +1,38 @@ | |||
29 | 1 | Ideas: | ||
30 | 2 | |||
31 | 3 | Collect data, remove paraphyletic | ||
32 | 4 | |||
33 | 5 | Take taxonomy (from dbs), phyml, users knowledge (encoded as subs file) and information on synonyms (from dbs) | ||
34 | 6 | to create a master subs file that takes the dat to species level | ||
35 | 7 | |||
36 | 8 | User needs to be able to edit taxonomy - CSV file | ||
37 | 9 | |||
38 | 10 | User needs to choose database source - preferred source. | ||
39 | 11 | |||
40 | 12 | |||
41 | 13 | Taxonomic name checker: | ||
42 | 14 | |||
43 | 15 | - use database to get synonyms and possible mispellings | ||
44 | 16 | - Gui is a 2 column table with green, yellow, red. User filles in red (or removes it), green is fine. Yellow - drop down list with alternatives. | ||
45 | 17 | - Use this to generate a two column CSV file | ||
46 | 18 | - On CLI, generate a three column CSV. Original name, new name (or blank for unknown) and a list of possibles. Warn user they *must* fill in the second column or remove the row or the taxa will be deleted. | ||
47 | 19 | |||
48 | 20 | For colloqual names, user adds to column 1 of taxonomy csv and then adds the latin name in the approriate column of the database. The subs can then generate the species list. | ||
49 | 21 | |||
50 | 22 | Use these two csv files to generate a subs file, including replacing higher taxa and genera to create a "to species" substtution (can also output this file for later) | ||
51 | 23 | |||
52 | 24 | Generating data to any taxonomic level can happen later - need to check each species is accounted for in the taxonomy, with correct levels - may need another parse of the taxonomy csv | ||
53 | 25 | |||
54 | 26 | |||
55 | 27 | Add data -> paraphyletic taxa -> taxonomy checker -> sub synonyms -> taxonomy generator -> create species level dataset | ||
56 | 28 | |||
57 | 29 | New functions: | ||
58 | 30 | - taxonomic name checker (this might take a while when online for large dataset) - note that this should be a one for one substitution - seperate function so we can check this? | ||
59 | 31 | - Pull in taxonomy generator | ||
60 | 32 | - Add csv file to schema | ||
61 | 33 | - amaend manual with workflow | ||
62 | 34 | - warning on multiple subs in data in manual | ||
63 | 35 | - generate species level subsfile from taxonomy | ||
64 | 36 | - generate specified taxonomic level data | ||
65 | 37 | |||
66 | 38 | |||
67 | 0 | 39 | ||
68 | === modified file 'stk/bzr_version.py' | |||
69 | --- stk/bzr_version.py 2017-01-11 17:42:56 +0000 | |||
70 | +++ stk/bzr_version.py 2017-01-12 09:27:31 +0000 | |||
71 | @@ -4,12 +4,12 @@ | |||
72 | 4 | So don't edit it. :) | 4 | So don't edit it. :) |
73 | 5 | """ | 5 | """ |
74 | 6 | 6 | ||
77 | 7 | version_info = {'branch_nick': u'supertree-toolkit', | 7 | version_info = {'branch_nick': u'sub_in_subfile', |
78 | 8 | 'build_date': '2017-01-11 17:42:27 +0000', | 8 | 'build_date': '2017-01-11 17:48:33 +0000', |
79 | 9 | 'clean': None, | 9 | 'clean': None, |
83 | 10 | 'date': '2017-01-11 17:39:43 +0000', | 10 | 'date': '2017-01-11 17:48:18 +0000', |
84 | 11 | 'revision_id': 'jon.hill@imperial.ac.uk-20170111173943-88so1icr33su3afo', | 11 | 'revision_id': 'jon.hill@imperial.ac.uk-20170111174818-9q8a9octvnawruuw', |
85 | 12 | 'revno': '279'} | 12 | 'revno': '317'} |
86 | 13 | 13 | ||
87 | 14 | revisions = {} | 14 | revisions = {} |
88 | 15 | 15 | ||
89 | 16 | 16 | ||
90 | === modified file 'stk/p4/NexusToken.py' | |||
91 | --- stk/p4/NexusToken.py 2012-01-11 08:57:43 +0000 | |||
92 | +++ stk/p4/NexusToken.py 2017-01-12 09:27:31 +0000 | |||
93 | @@ -44,6 +44,7 @@ | |||
94 | 44 | gm = ["safeNextTok(), called from %s" % caller] | 44 | gm = ["safeNextTok(), called from %s" % caller] |
95 | 45 | else: | 45 | else: |
96 | 46 | gm = ["safeNextTok()"] | 46 | gm = ["safeNextTok()"] |
97 | 47 | print flob | ||
98 | 47 | gm.append("Premature Death.") | 48 | gm.append("Premature Death.") |
99 | 48 | gm.append("Ran out of understandable things to read in nexus file.") | 49 | gm.append("Ran out of understandable things to read in nexus file.") |
100 | 49 | raise Glitch, gm | 50 | raise Glitch, gm |
101 | 50 | 51 | ||
102 | === modified file 'stk/p4/NexusToken2.py' | |||
103 | --- stk/p4/NexusToken2.py 2012-01-11 08:57:43 +0000 | |||
104 | +++ stk/p4/NexusToken2.py 2017-01-12 09:27:31 +0000 | |||
105 | @@ -88,7 +88,7 @@ | |||
106 | 88 | else: | 88 | else: |
107 | 89 | gm = ["safeNextTok()"] | 89 | gm = ["safeNextTok()"] |
108 | 90 | gm.append("Premature Death.") | 90 | gm.append("Premature Death.") |
110 | 91 | gm.append("Ran out of understandable things to read in nexus file.") | 91 | gm.append("Ran out of understandable things to read in nexus file." + str(flob)) |
111 | 92 | raise Glitch, gm | 92 | raise Glitch, gm |
112 | 93 | else: | 93 | else: |
113 | 94 | return t | 94 | return t |
114 | 95 | 95 | ||
115 | === modified file 'stk/p4/Tree.py' | |||
116 | --- stk/p4/Tree.py 2013-08-25 09:24:34 +0000 | |||
117 | +++ stk/p4/Tree.py 2017-01-12 09:27:31 +0000 | |||
118 | @@ -996,17 +996,9 @@ | |||
119 | 996 | if not item.name: | 996 | if not item.name: |
120 | 997 | if item == self.root: | 997 | if item == self.root: |
121 | 998 | if var.fixRootedTrees: | 998 | if var.fixRootedTrees: |
127 | 999 | if self.name: | 999 | #print "Fixing tree to work with SuperTree scores" |
123 | 1000 | print "Tree.initFinish() tree '%s'" % self.name | ||
124 | 1001 | else: | ||
125 | 1002 | print 'Tree.initFinish()' | ||
126 | 1003 | print "Fixing tree to work with SuperTree scores" | ||
128 | 1004 | self.removeRoot() | 1000 | self.removeRoot() |
129 | 1005 | elif var.warnAboutTerminalRootWithNoName: | 1001 | elif var.warnAboutTerminalRootWithNoName: |
130 | 1006 | if self.name: | ||
131 | 1007 | print "Tree.initFinish() tree '%s'" % self.name | ||
132 | 1008 | else: | ||
133 | 1009 | print 'Tree.initFinish()' | ||
134 | 1010 | print ' Non-fatal warning: the root is terminal, but has no name.' | 1002 | print ' Non-fatal warning: the root is terminal, but has no name.' |
135 | 1011 | print ' This may be what you wanted. Or not?' | 1003 | print ' This may be what you wanted. Or not?' |
136 | 1012 | print ' (To get rid of this warning, turn off var.warnAboutTerminalRootWithNoName)' | 1004 | print ' (To get rid of this warning, turn off var.warnAboutTerminalRootWithNoName)' |
137 | 1013 | 1005 | ||
138 | === modified file 'stk/p4/Tree_muck.py' | |||
139 | --- stk/p4/Tree_muck.py 2015-02-19 14:47:06 +0000 | |||
140 | +++ stk/p4/Tree_muck.py 2017-01-12 09:27:31 +0000 | |||
141 | @@ -769,6 +769,7 @@ | |||
142 | 769 | else: | 769 | else: |
143 | 770 | gm.append("The 2 specified nodes should have a parent-child relationship") | 770 | gm.append("The 2 specified nodes should have a parent-child relationship") |
144 | 771 | raise Glitch, gm | 771 | raise Glitch, gm |
145 | 772 | |||
146 | 772 | if var.usePfAndNumpy: | 773 | if var.usePfAndNumpy: |
147 | 773 | self.deleteCStuff() | 774 | self.deleteCStuff() |
148 | 774 | 775 | ||
149 | @@ -1629,7 +1630,7 @@ | |||
150 | 1629 | 1630 | ||
151 | 1630 | 1631 | ||
152 | 1631 | 1632 | ||
154 | 1632 | def addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None): | 1633 | def addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None, ignoreRootAssert=False): |
155 | 1633 | """Add a subtree to a tree. | 1634 | """Add a subtree to a tree. |
156 | 1634 | 1635 | ||
157 | 1635 | The nodes from theSubTree are added to self.nodes, and theSubTree | 1636 | The nodes from theSubTree are added to self.nodes, and theSubTree |
158 | @@ -1666,7 +1667,8 @@ | |||
159 | 1666 | 1667 | ||
160 | 1667 | assert selfNode in self.nodes | 1668 | assert selfNode in self.nodes |
161 | 1668 | assert selfNode.parent | 1669 | assert selfNode.parent |
163 | 1669 | assert theSubTree.root.leftChild and not theSubTree.root.leftChild.sibling # its a root on a stick | 1670 | if not ignoreRootAssert: |
164 | 1671 | assert theSubTree.root.leftChild and not theSubTree.root.leftChild.sibling # its a root on a stick | ||
165 | 1670 | if not subTreeTaxNames: | 1672 | if not subTreeTaxNames: |
166 | 1671 | subTreeTaxNames = [n.name for n in theSubTree.iterLeavesNoRoot()] | 1673 | subTreeTaxNames = [n.name for n in theSubTree.iterLeavesNoRoot()] |
167 | 1672 | 1674 | ||
168 | 1673 | 1675 | ||
169 | === removed file 'stk/scripts/check_nomenclature.py' | |||
170 | --- stk/scripts/check_nomenclature.py 2016-07-14 10:12:17 +0000 | |||
171 | +++ stk/scripts/check_nomenclature.py 1970-01-01 00:00:00 +0000 | |||
172 | @@ -1,224 +0,0 @@ | |||
173 | 1 | #!/usr/bin/env python | ||
174 | 2 | # | ||
175 | 3 | # Derived from the Supertree Toolkit. Software for managing and manipulating sources | ||
176 | 4 | # trees ready for supretree construction. | ||
177 | 5 | # Copyright (C) 2015, Jon Hill, Katie Davis | ||
178 | 6 | # | ||
179 | 7 | # This program is free software: you can redistribute it and/or modify | ||
180 | 8 | # it under the terms of the GNU General Public License as published by | ||
181 | 9 | # the Free Software Foundation, either version 3 of the License, or | ||
182 | 10 | # (at your option) any later version. | ||
183 | 11 | # | ||
184 | 12 | # This program is distributed in the hope that it will be useful, | ||
185 | 13 | # but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
186 | 14 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
187 | 15 | # GNU General Public License for more details. | ||
188 | 16 | # | ||
189 | 17 | # You should have received a copy of the GNU General Public License | ||
190 | 18 | # along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
191 | 19 | # | ||
192 | 20 | # Jon Hill. jon.hill@york.ac.uk. | ||
193 | 21 | # | ||
194 | 22 | # | ||
195 | 23 | # This is an enitrely self-contained script that does not require the STK to be installed. | ||
196 | 24 | |||
197 | 25 | import urllib2 | ||
198 | 26 | from urllib import quote_plus | ||
199 | 27 | import simplejson as json | ||
200 | 28 | import argparse | ||
201 | 29 | import os | ||
202 | 30 | import sys | ||
203 | 31 | import csv | ||
204 | 32 | |||
205 | 33 | def main(): | ||
206 | 34 | |||
207 | 35 | # do stuff | ||
208 | 36 | parser = argparse.ArgumentParser( | ||
209 | 37 | prog="Check nomenclature", | ||
210 | 38 | description="Check nomenclature from a tree file or list against valid names derived from EOL", | ||
211 | 39 | ) | ||
212 | 40 | parser.add_argument( | ||
213 | 41 | '-v', | ||
214 | 42 | '--verbose', | ||
215 | 43 | action='store_true', | ||
216 | 44 | help="Verbose output: mainly progress reports.", | ||
217 | 45 | default=False | ||
218 | 46 | ) | ||
219 | 47 | parser.add_argument( | ||
220 | 48 | '--existing', | ||
221 | 49 | help="An existing output file to update further, e.g. with a new set of taxa. Supply the file name." | ||
222 | 50 | ) | ||
223 | 51 | parser.add_argument( | ||
224 | 52 | 'input_file', | ||
225 | 53 | metavar='input_file', | ||
226 | 54 | nargs=1, | ||
227 | 55 | help="Your input taxa list" | ||
228 | 56 | ) | ||
229 | 57 | parser.add_argument( | ||
230 | 58 | 'output_file', | ||
231 | 59 | metavar='output_file', | ||
232 | 60 | nargs=1, | ||
233 | 61 | help="The output file. A CSV-based output, listing name checked, valid name, synonyms and status (red, amber, yellow, green)." | ||
234 | 62 | ) | ||
235 | 63 | |||
236 | 64 | args = parser.parse_args() | ||
237 | 65 | verbose = args.verbose | ||
238 | 66 | input_file = args.input_file[0] | ||
239 | 67 | output_file = args.output_file[0] | ||
240 | 68 | existing_data = args.existing | ||
241 | 69 | |||
242 | 70 | if (not existing_data == None): | ||
243 | 71 | exiting_data = load_equivalents(existing_data) | ||
244 | 72 | else: | ||
245 | 73 | existing_data = None | ||
246 | 74 | |||
247 | 75 | with open(input_file,'r') as f: | ||
248 | 76 | lines = f.read().splitlines() | ||
249 | 77 | equivs = taxonomic_checker_list(lines, existing_data, verbose=verbose) | ||
250 | 78 | |||
251 | 79 | |||
252 | 80 | f = open(output_file,"w") | ||
253 | 81 | for taxon in sorted(equivs.keys()): | ||
254 | 82 | f.write(taxon+","+";".join(equivs[taxon][0])+","+equivs[taxon][1]+"\n") | ||
255 | 83 | f.close() | ||
256 | 84 | |||
257 | 85 | return | ||
258 | 86 | |||
259 | 87 | |||
260 | 88 | def taxonomic_checker_list(name_list,existing_data=None,verbose=False): | ||
261 | 89 | """ For each name in the database generate a database of the original name, | ||
262 | 90 | possible synonyms and if the taxon is not know, signal that. We do this by | ||
263 | 91 | using the EoL API to grab synonyms of each taxon. """ | ||
264 | 92 | |||
265 | 93 | |||
266 | 94 | if existing_data == None: | ||
267 | 95 | equivalents = {} | ||
268 | 96 | else: | ||
269 | 97 | equivalents = existing_data | ||
270 | 98 | |||
271 | 99 | # for each taxon, check the name on EoL - what if it's a synonym? Does EoL still return a result? | ||
272 | 100 | # if not, is there another API function to do this? | ||
273 | 101 | # search for the taxon and grab the name - if you search for a recognised synonym on EoL then | ||
274 | 102 | # you get the original ('correct') name - shorten this to two words and you're done. | ||
275 | 103 | for t in name_list: | ||
276 | 104 | # make sure t has no spaces. | ||
277 | 105 | t = t.replace(" ","_") | ||
278 | 106 | if t in equivalents: | ||
279 | 107 | continue | ||
280 | 108 | taxon = t.replace("_"," ") | ||
281 | 109 | if (verbose): | ||
282 | 110 | print "Looking up ", taxon | ||
283 | 111 | # get the data from EOL on taxon | ||
284 | 112 | taxonq = quote_plus(taxon) | ||
285 | 113 | URL = "http://eol.org/api/search/1.0.json?q="+taxonq | ||
286 | 114 | req = urllib2.Request(URL) | ||
287 | 115 | opener = urllib2.build_opener() | ||
288 | 116 | f = opener.open(req) | ||
289 | 117 | data = json.load(f) | ||
290 | 118 | # check if there's some data | ||
291 | 119 | if len(data['results']) == 0: | ||
292 | 120 | equivalents[t] = [[t],'red'] | ||
293 | 121 | continue | ||
294 | 122 | amber = False | ||
295 | 123 | if len(data['results']) > 1: | ||
296 | 124 | # this is not great - we have multiple hits for this taxon - needs the user to go back and warn about this | ||
297 | 125 | # for automatic processing we'll just take the first one though | ||
298 | 126 | # colour is amber in this case | ||
299 | 127 | amber = True | ||
300 | 128 | ID = str(data['results'][0]['id']) # take first hit | ||
301 | 129 | URL = "http://eol.org/api/pages/1.0/"+ID+".json?images=2&videos=0&sounds=0&maps=0&text=2&iucn=false&subjects=overview&licenses=all&details=true&common_names=true&synonyms=true&references=true&vetted=0" | ||
302 | 130 | req = urllib2.Request(URL) | ||
303 | 131 | opener = urllib2.build_opener() | ||
304 | 132 | |||
305 | 133 | try: | ||
306 | 134 | f = opener.open(req) | ||
307 | 135 | except urllib2.HTTPError: | ||
308 | 136 | equivalents[t] = [[t],'red'] | ||
309 | 137 | continue | ||
310 | 138 | data = json.load(f) | ||
311 | 139 | if len(data['scientificName']) == 0: | ||
312 | 140 | # not found a scientific name, so set as red | ||
313 | 141 | equivalents[t] = [[t],'red'] | ||
314 | 142 | continue | ||
315 | 143 | correct_name = data['scientificName'].encode("ascii","ignore") | ||
316 | 144 | # we only want the first two bits of the name, not the original author and year if any | ||
317 | 145 | temp_name = correct_name.split(' ') | ||
318 | 146 | if (len(temp_name) > 2): | ||
319 | 147 | correct_name = ' '.join(temp_name[0:2]) | ||
320 | 148 | correct_name = correct_name.replace(' ','_') | ||
321 | 149 | print correct_name, t | ||
322 | 150 | |||
323 | 151 | # build up the output dictionary - original name is key, synonyms/missing is value | ||
324 | 152 | if (correct_name == t or correct_name == taxon): | ||
325 | 153 | # if the original matches the 'correct', then it's green | ||
326 | 154 | equivalents[t] = [[t], 'green'] | ||
327 | 155 | else: | ||
328 | 156 | # if we managed to get something anyway, then it's yellow and create a list of possible synonyms with the | ||
329 | 157 | # 'correct' taxon at the top | ||
330 | 158 | eol_synonyms = data['synonyms'] | ||
331 | 159 | synonyms = [] | ||
332 | 160 | for s in eol_synonyms: | ||
333 | 161 | ts = s['synonym'].encode("ascii","ignore") | ||
334 | 162 | temp_syn = ts.split(' ') | ||
335 | 163 | if (len(temp_syn) > 2): | ||
336 | 164 | temp_syn = ' '.join(temp_syn[0:2]) | ||
337 | 165 | ts = temp_syn | ||
338 | 166 | if (s['relationship'] == "synonym"): | ||
339 | 167 | ts = ts.replace(" ","_") | ||
340 | 168 | synonyms.append(ts) | ||
341 | 169 | synonyms = _uniquify(synonyms) | ||
342 | 170 | # we need to put the correct name at the top of the list now | ||
343 | 171 | if (correct_name in synonyms): | ||
344 | 172 | synonyms.insert(0, synonyms.pop(synonyms.index(correct_name))) | ||
345 | 173 | elif len(synonyms) == 0: | ||
346 | 174 | synonyms.append(correct_name) | ||
347 | 175 | else: | ||
348 | 176 | synonyms.insert(0,correct_name) | ||
349 | 177 | |||
350 | 178 | if (amber): | ||
351 | 179 | equivalents[t] = [synonyms,'amber'] | ||
352 | 180 | else: | ||
353 | 181 | equivalents[t] = [synonyms,'yellow'] | ||
354 | 182 | # if our search was empty, then it's red - see above | ||
355 | 183 | |||
356 | 184 | # up to the calling funciton to do something sensible with this | ||
357 | 185 | # we build a dictionary of names and then a list of synonyms or the original name, then a tag if it's green, yellow, red. | ||
358 | 186 | # Amber means we found synonyms and multilpe hits. User def needs to sort these! | ||
359 | 187 | |||
360 | 188 | return equivalents | ||
361 | 189 | |||
362 | 190 | def load_equivalents(equiv_csv): | ||
363 | 191 | """Load equivalents data from a csv and convert to a equivalents Dict. | ||
364 | 192 | Structure is key, with a list that is array of synonyms, followed by status ('green', | ||
365 | 193 | 'yellow', 'amber', or 'red'). | ||
366 | 194 | |||
367 | 195 | """ | ||
368 | 196 | |||
369 | 197 | import csv | ||
370 | 198 | |||
371 | 199 | equivalents = {} | ||
372 | 200 | |||
373 | 201 | with open(equiv_csv, 'rU') as csvfile: | ||
374 | 202 | equiv_reader = csv.reader(csvfile, delimiter=',') | ||
375 | 203 | equiv_reader.next() # skip header | ||
376 | 204 | for row in equiv_reader: | ||
377 | 205 | i = 1 | ||
378 | 206 | equivalents[row[0]] = [row[1].split(';'),row[2]] | ||
379 | 207 | |||
380 | 208 | return equivalents | ||
381 | 209 | |||
382 | 210 | def _uniquify(l): | ||
383 | 211 | """ | ||
384 | 212 | Make a list, l, contain only unique data | ||
385 | 213 | """ | ||
386 | 214 | keys = {} | ||
387 | 215 | for e in l: | ||
388 | 216 | keys[e] = 1 | ||
389 | 217 | |||
390 | 218 | return keys.keys() | ||
391 | 219 | |||
392 | 220 | if __name__ == "__main__": | ||
393 | 221 | main() | ||
394 | 222 | |||
395 | 223 | |||
396 | 224 | |||
397 | 225 | 0 | ||
398 | === added file 'stk/scripts/check_nomenclature.py.moved' | |||
399 | --- stk/scripts/check_nomenclature.py.moved 1970-01-01 00:00:00 +0000 | |||
400 | +++ stk/scripts/check_nomenclature.py.moved 2017-01-12 09:27:31 +0000 | |||
401 | @@ -0,0 +1,224 @@ | |||
402 | 1 | #!/usr/bin/env python | ||
403 | 2 | # | ||
404 | 3 | # Derived from the Supertree Toolkit. Software for managing and manipulating sources | ||
405 | 4 | # trees ready for supretree construction. | ||
406 | 5 | # Copyright (C) 2015, Jon Hill, Katie Davis | ||
407 | 6 | # | ||
408 | 7 | # This program is free software: you can redistribute it and/or modify | ||
409 | 8 | # it under the terms of the GNU General Public License as published by | ||
410 | 9 | # the Free Software Foundation, either version 3 of the License, or | ||
411 | 10 | # (at your option) any later version. | ||
412 | 11 | # | ||
413 | 12 | # This program is distributed in the hope that it will be useful, | ||
414 | 13 | # but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
415 | 14 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
416 | 15 | # GNU General Public License for more details. | ||
417 | 16 | # | ||
418 | 17 | # You should have received a copy of the GNU General Public License | ||
419 | 18 | # along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
420 | 19 | # | ||
421 | 20 | # Jon Hill. jon.hill@york.ac.uk. | ||
422 | 21 | # | ||
423 | 22 | # | ||
424 | 23 | # This is an enitrely self-contained script that does not require the STK to be installed. | ||
425 | 24 | |||
426 | 25 | import urllib2 | ||
427 | 26 | from urllib import quote_plus | ||
428 | 27 | import simplejson as json | ||
429 | 28 | import argparse | ||
430 | 29 | import os | ||
431 | 30 | import sys | ||
432 | 31 | import csv | ||
433 | 32 | |||
434 | 33 | def main(): | ||
435 | 34 | |||
436 | 35 | # do stuff | ||
437 | 36 | parser = argparse.ArgumentParser( | ||
438 | 37 | prog="Check nomenclature", | ||
439 | 38 | description="Check nomenclature from a tree file or list against valid names derived from EOL", | ||
440 | 39 | ) | ||
441 | 40 | parser.add_argument( | ||
442 | 41 | '-v', | ||
443 | 42 | '--verbose', | ||
444 | 43 | action='store_true', | ||
445 | 44 | help="Verbose output: mainly progress reports.", | ||
446 | 45 | default=False | ||
447 | 46 | ) | ||
448 | 47 | parser.add_argument( | ||
449 | 48 | '--existing', | ||
450 | 49 | help="An existing output file to update further, e.g. with a new set of taxa. Supply the file name." | ||
451 | 50 | ) | ||
452 | 51 | parser.add_argument( | ||
453 | 52 | 'input_file', | ||
454 | 53 | metavar='input_file', | ||
455 | 54 | nargs=1, | ||
456 | 55 | help="Your input taxa list" | ||
457 | 56 | ) | ||
458 | 57 | parser.add_argument( | ||
459 | 58 | 'output_file', | ||
460 | 59 | metavar='output_file', | ||
461 | 60 | nargs=1, | ||
462 | 61 | help="The output file. A CSV-based output, listing name checked, valid name, synonyms and status (red, amber, yellow, green)." | ||
463 | 62 | ) | ||
464 | 63 | |||
465 | 64 | args = parser.parse_args() | ||
466 | 65 | verbose = args.verbose | ||
467 | 66 | input_file = args.input_file[0] | ||
468 | 67 | output_file = args.output_file[0] | ||
469 | 68 | existing_data = args.existing | ||
470 | 69 | |||
471 | 70 | if (not existing_data == None): | ||
472 | 71 | exiting_data = load_equivalents(existing_data) | ||
473 | 72 | else: | ||
474 | 73 | existing_data = None | ||
475 | 74 | |||
476 | 75 | with open(input_file,'r') as f: | ||
477 | 76 | lines = f.read().splitlines() | ||
478 | 77 | equivs = taxonomic_checker_list(lines, existing_data, verbose=verbose) | ||
479 | 78 | |||
480 | 79 | |||
481 | 80 | f = open(output_file,"w") | ||
482 | 81 | for taxon in sorted(equivs.keys()): | ||
483 | 82 | f.write(taxon+","+";".join(equivs[taxon][0])+","+equivs[taxon][1]+"\n") | ||
484 | 83 | f.close() | ||
485 | 84 | |||
486 | 85 | return | ||
487 | 86 | |||
488 | 87 | |||
489 | 88 | def taxonomic_checker_list(name_list,existing_data=None,verbose=False): | ||
490 | 89 | """ For each name in the database generate a database of the original name, | ||
491 | 90 | possible synonyms and if the taxon is not know, signal that. We do this by | ||
492 | 91 | using the EoL API to grab synonyms of each taxon. """ | ||
493 | 92 | |||
494 | 93 | |||
495 | 94 | if existing_data == None: | ||
496 | 95 | equivalents = {} | ||
497 | 96 | else: | ||
498 | 97 | equivalents = existing_data | ||
499 | 98 | |||
500 | 99 | # for each taxon, check the name on EoL - what if it's a synonym? Does EoL still return a result? | ||
501 | 100 | # if not, is there another API function to do this? | ||
502 | 101 | # search for the taxon and grab the name - if you search for a recognised synonym on EoL then | ||
503 | 102 | # you get the original ('correct') name - shorten this to two words and you're done. | ||
504 | 103 | for t in name_list: | ||
505 | 104 | # make sure t has no spaces. | ||
506 | 105 | t = t.replace(" ","_") | ||
507 | 106 | if t in equivalents: | ||
508 | 107 | continue | ||
509 | 108 | taxon = t.replace("_"," ") | ||
510 | 109 | if (verbose): | ||
511 | 110 | print "Looking up ", taxon | ||
512 | 111 | # get the data from EOL on taxon | ||
513 | 112 | taxonq = quote_plus(taxon) | ||
514 | 113 | URL = "http://eol.org/api/search/1.0.json?q="+taxonq | ||
515 | 114 | req = urllib2.Request(URL) | ||
516 | 115 | opener = urllib2.build_opener() | ||
517 | 116 | f = opener.open(req) | ||
518 | 117 | data = json.load(f) | ||
519 | 118 | # check if there's some data | ||
520 | 119 | if len(data['results']) == 0: | ||
521 | 120 | equivalents[t] = [[t],'red'] | ||
522 | 121 | continue | ||
523 | 122 | amber = False | ||
524 | 123 | if len(data['results']) > 1: | ||
525 | 124 | # this is not great - we have multiple hits for this taxon - needs the user to go back and warn about this | ||
526 | 125 | # for automatic processing we'll just take the first one though | ||
527 | 126 | # colour is amber in this case | ||
528 | 127 | amber = True | ||
529 | 128 | ID = str(data['results'][0]['id']) # take first hit | ||
530 | 129 | URL = "http://eol.org/api/pages/1.0/"+ID+".json?images=2&videos=0&sounds=0&maps=0&text=2&iucn=false&subjects=overview&licenses=all&details=true&common_names=true&synonyms=true&references=true&vetted=0" | ||
531 | 130 | req = urllib2.Request(URL) | ||
532 | 131 | opener = urllib2.build_opener() | ||
533 | 132 | |||
534 | 133 | try: | ||
535 | 134 | f = opener.open(req) | ||
536 | 135 | except urllib2.HTTPError: | ||
537 | 136 | equivalents[t] = [[t],'red'] | ||
538 | 137 | continue | ||
539 | 138 | data = json.load(f) | ||
540 | 139 | if len(data['scientificName']) == 0: | ||
541 | 140 | # not found a scientific name, so set as red | ||
542 | 141 | equivalents[t] = [[t],'red'] | ||
543 | 142 | continue | ||
544 | 143 | correct_name = data['scientificName'].encode("ascii","ignore") | ||
545 | 144 | # we only want the first two bits of the name, not the original author and year if any | ||
546 | 145 | temp_name = correct_name.split(' ') | ||
547 | 146 | if (len(temp_name) > 2): | ||
548 | 147 | correct_name = ' '.join(temp_name[0:2]) | ||
549 | 148 | correct_name = correct_name.replace(' ','_') | ||
550 | 149 | print correct_name, t | ||
551 | 150 | |||
552 | 151 | # build up the output dictionary - original name is key, synonyms/missing is value | ||
553 | 152 | if (correct_name == t or correct_name == taxon): | ||
554 | 153 | # if the original matches the 'correct', then it's green | ||
555 | 154 | equivalents[t] = [[t], 'green'] | ||
556 | 155 | else: | ||
557 | 156 | # if we managed to get something anyway, then it's yellow and create a list of possible synonyms with the | ||
558 | 157 | # 'correct' taxon at the top | ||
559 | 158 | eol_synonyms = data['synonyms'] | ||
560 | 159 | synonyms = [] | ||
561 | 160 | for s in eol_synonyms: | ||
562 | 161 | ts = s['synonym'].encode("ascii","ignore") | ||
563 | 162 | temp_syn = ts.split(' ') | ||
564 | 163 | if (len(temp_syn) > 2): | ||
565 | 164 | temp_syn = ' '.join(temp_syn[0:2]) | ||
566 | 165 | ts = temp_syn | ||
567 | 166 | if (s['relationship'] == "synonym"): | ||
568 | 167 | ts = ts.replace(" ","_") | ||
569 | 168 | synonyms.append(ts) | ||
570 | 169 | synonyms = _uniquify(synonyms) | ||
571 | 170 | # we need to put the correct name at the top of the list now | ||
572 | 171 | if (correct_name in synonyms): | ||
573 | 172 | synonyms.insert(0, synonyms.pop(synonyms.index(correct_name))) | ||
574 | 173 | elif len(synonyms) == 0: | ||
575 | 174 | synonyms.append(correct_name) | ||
576 | 175 | else: | ||
577 | 176 | synonyms.insert(0,correct_name) | ||
578 | 177 | |||
579 | 178 | if (amber): | ||
580 | 179 | equivalents[t] = [synonyms,'amber'] | ||
581 | 180 | else: | ||
582 | 181 | equivalents[t] = [synonyms,'yellow'] | ||
583 | 182 | # if our search was empty, then it's red - see above | ||
584 | 183 | |||
585 | 184 | # up to the calling funciton to do something sensible with this | ||
586 | 185 | # we build a dictionary of names and then a list of synonyms or the original name, then a tag if it's green, yellow, red. | ||
587 | 186 | # Amber means we found synonyms and multilpe hits. User def needs to sort these! | ||
588 | 187 | |||
589 | 188 | return equivalents | ||
590 | 189 | |||
591 | 190 | def load_equivalents(equiv_csv): | ||
592 | 191 | """Load equivalents data from a csv and convert to a equivalents Dict. | ||
593 | 192 | Structure is key, with a list that is array of synonyms, followed by status ('green', | ||
594 | 193 | 'yellow', 'amber', or 'red'). | ||
595 | 194 | |||
596 | 195 | """ | ||
597 | 196 | |||
598 | 197 | import csv | ||
599 | 198 | |||
600 | 199 | equivalents = {} | ||
601 | 200 | |||
602 | 201 | with open(equiv_csv, 'rU') as csvfile: | ||
603 | 202 | equiv_reader = csv.reader(csvfile, delimiter=',') | ||
604 | 203 | equiv_reader.next() # skip header | ||
605 | 204 | for row in equiv_reader: | ||
606 | 205 | i = 1 | ||
607 | 206 | equivalents[row[0]] = [row[1].split(';'),row[2]] | ||
608 | 207 | |||
609 | 208 | return equivalents | ||
610 | 209 | |||
611 | 210 | def _uniquify(l): | ||
612 | 211 | """ | ||
613 | 212 | Make a list, l, contain only unique data | ||
614 | 213 | """ | ||
615 | 214 | keys = {} | ||
616 | 215 | for e in l: | ||
617 | 216 | keys[e] = 1 | ||
618 | 217 | |||
619 | 218 | return keys.keys() | ||
620 | 219 | |||
621 | 220 | if __name__ == "__main__": | ||
622 | 221 | main() | ||
623 | 222 | |||
624 | 223 | |||
625 | 224 | |||
626 | 0 | 225 | ||
627 | === modified file 'stk/scripts/create_colours_itol.py' | |||
628 | --- stk/scripts/create_colours_itol.py 2014-12-09 10:58:48 +0000 | |||
629 | +++ stk/scripts/create_colours_itol.py 2017-01-12 09:27:31 +0000 | |||
630 | @@ -88,17 +88,8 @@ | |||
631 | 88 | saturation=0.25 | 88 | saturation=0.25 |
632 | 89 | value=0.8 | 89 | value=0.8 |
633 | 90 | 90 | ||
645 | 91 | index = 3 # family | 91 | index = stk.taxonomy_levels.index(level.lower())+1 |
646 | 92 | if (level == "Superfamily"): | 92 | print index |
636 | 93 | index = 4 | ||
637 | 94 | elif (level == "Infraorder"): | ||
638 | 95 | index = 5 | ||
639 | 96 | elif (level == "Suborder"): | ||
640 | 97 | index = 6 | ||
641 | 98 | elif (level == "Order"): | ||
642 | 99 | index = 7 | ||
643 | 100 | elif (level == "Genus"): | ||
644 | 101 | index = 2 | ||
647 | 102 | 93 | ||
648 | 103 | if (tree): | 94 | if (tree): |
649 | 104 | tree_data = stk.import_tree(input_file) | 95 | tree_data = stk.import_tree(input_file) |
650 | 105 | 96 | ||
651 | === modified file 'stk/scripts/create_taxonomy.py' | |||
652 | --- stk/scripts/create_taxonomy.py 2014-03-13 18:45:05 +0000 | |||
653 | +++ stk/scripts/create_taxonomy.py 2017-01-12 09:27:31 +0000 | |||
654 | @@ -16,6 +16,8 @@ | |||
655 | 16 | import supertree_toolkit as stk | 16 | import supertree_toolkit as stk |
656 | 17 | import csv | 17 | import csv |
657 | 18 | 18 | ||
658 | 19 | taxonomy_levels = stk.taxonomy_levels | ||
659 | 20 | |||
660 | 19 | def main(): | 21 | def main(): |
661 | 20 | 22 | ||
662 | 21 | # do stuff | 23 | # do stuff |
663 | @@ -66,13 +68,6 @@ | |||
664 | 66 | f.close() | 68 | f.close() |
665 | 67 | 69 | ||
666 | 68 | taxonomy = {} | 70 | taxonomy = {} |
667 | 69 | # What we get from EOL | ||
668 | 70 | current_taxonomy_levels = ['species','genus','family','order','class','phylum','kingdom'] | ||
669 | 71 | # And the extra ones from ITIS | ||
670 | 72 | extra_taxonomy_levels = ['superfamily','infraorder','suborder','superorder','subclass','subphylum','superphylum','infrakingdom','subkingdom'] | ||
671 | 73 | # all of them in order | ||
672 | 74 | taxonomy_levels = ['species','genus','family','superfamily','infraorder','suborder','order','superorder','subclass','class','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] | ||
673 | 75 | |||
674 | 76 | 71 | ||
675 | 77 | for taxon in taxa: | 72 | for taxon in taxa: |
676 | 78 | taxon = taxon.replace("_"," ") | 73 | taxon = taxon.replace("_"," ") |
677 | @@ -180,99 +175,8 @@ | |||
678 | 180 | continue | 175 | continue |
679 | 181 | 176 | ||
680 | 182 | 177 | ||
774 | 183 | # Now create the CSV output | 178 | stk.save_taxonomy(taxonomy, output_file) |
775 | 184 | with open(output_file, 'w') as f: | 179 | |
683 | 185 | writer = csv.writer(f) | ||
684 | 186 | writer.writerow(taxonomy_levels) | ||
685 | 187 | for t in taxonomy: | ||
686 | 188 | species = t | ||
687 | 189 | try: | ||
688 | 190 | genus = taxonomy[t]['genus'] | ||
689 | 191 | except KeyError: | ||
690 | 192 | genus = "-" | ||
691 | 193 | try: | ||
692 | 194 | family = taxonomy[t]['family'] | ||
693 | 195 | except KeyError: | ||
694 | 196 | family = "-" | ||
695 | 197 | try: | ||
696 | 198 | superfamily = taxonomy[t]['superfamily'] | ||
697 | 199 | except KeyError: | ||
698 | 200 | superfamily = "-" | ||
699 | 201 | try: | ||
700 | 202 | infraorder = taxonomy[t]['infraorder'] | ||
701 | 203 | except KeyError: | ||
702 | 204 | infraorder = "-" | ||
703 | 205 | try: | ||
704 | 206 | suborder = taxonomy[t]['suborder'] | ||
705 | 207 | except KeyError: | ||
706 | 208 | suborder = "-" | ||
707 | 209 | try: | ||
708 | 210 | order = taxonomy[t]['order'] | ||
709 | 211 | except KeyError: | ||
710 | 212 | order = "-" | ||
711 | 213 | try: | ||
712 | 214 | superorder = taxonomy[t]['superorder'] | ||
713 | 215 | except KeyError: | ||
714 | 216 | superorder = "-" | ||
715 | 217 | try: | ||
716 | 218 | subclass = taxonomy[t]['subclass'] | ||
717 | 219 | except KeyError: | ||
718 | 220 | subclass = "-" | ||
719 | 221 | try: | ||
720 | 222 | tclass = taxonomy[t]['class'] | ||
721 | 223 | except KeyError: | ||
722 | 224 | tclass = "-" | ||
723 | 225 | try: | ||
724 | 226 | subphylum = taxonomy[t]['subphylum'] | ||
725 | 227 | except KeyError: | ||
726 | 228 | subphylum = "-" | ||
727 | 229 | try: | ||
728 | 230 | phylum = taxonomy[t]['phylum'] | ||
729 | 231 | except KeyError: | ||
730 | 232 | phylum = "-" | ||
731 | 233 | try: | ||
732 | 234 | superphylum = taxonomy[t]['superphylum'] | ||
733 | 235 | except KeyError: | ||
734 | 236 | superphylum = "-" | ||
735 | 237 | try: | ||
736 | 238 | infrakingdom = taxonomy[t]['infrakingdom'] | ||
737 | 239 | except: | ||
738 | 240 | infrakingdom = "-" | ||
739 | 241 | try: | ||
740 | 242 | subkingdom = taxonomy[t]['subkingdom'] | ||
741 | 243 | except: | ||
742 | 244 | subkingdom = "-" | ||
743 | 245 | try: | ||
744 | 246 | kingdom = taxonomy[t]['kingdom'] | ||
745 | 247 | except KeyError: | ||
746 | 248 | kingdom = "-" | ||
747 | 249 | try: | ||
748 | 250 | provider = taxonomy[t]['provider'] | ||
749 | 251 | except KeyError: | ||
750 | 252 | provider = "-" | ||
751 | 253 | |||
752 | 254 | |||
753 | 255 | this_classification = [ | ||
754 | 256 | species.encode('utf-8'), | ||
755 | 257 | genus.encode('utf-8'), | ||
756 | 258 | family.encode('utf-8'), | ||
757 | 259 | superfamily.encode('utf-8'), | ||
758 | 260 | infraorder.encode('utf-8'), | ||
759 | 261 | suborder.encode('utf-8'), | ||
760 | 262 | order.encode('utf-8'), | ||
761 | 263 | superorder.encode('utf-8'), | ||
762 | 264 | subclass.encode('utf-8'), | ||
763 | 265 | tclass.encode('utf-8'), | ||
764 | 266 | subphylum.encode('utf-8'), | ||
765 | 267 | phylum.encode('utf-8'), | ||
766 | 268 | superphylum.encode('utf-8'), | ||
767 | 269 | infrakingdom.encode('utf-8'), | ||
768 | 270 | subkingdom.encode('utf-8'), | ||
769 | 271 | kingdom.encode('utf-8'), | ||
770 | 272 | provider.encode('utf-8')] | ||
771 | 273 | writer.writerow(this_classification) | ||
772 | 274 | |||
773 | 275 | |||
776 | 276 | def _uniquify(l): | 180 | def _uniquify(l): |
777 | 277 | """ | 181 | """ |
778 | 278 | Make a list, l, contain only unique data | 182 | Make a list, l, contain only unique data |
779 | 279 | 183 | ||
780 | === modified file 'stk/scripts/fill_in_with_taxonomy.py' | |||
781 | --- stk/scripts/fill_in_with_taxonomy.py 2016-12-14 16:22:12 +0000 | |||
782 | +++ stk/scripts/fill_in_with_taxonomy.py 2017-01-12 09:27:31 +0000 | |||
783 | @@ -23,21 +23,90 @@ | |||
784 | 23 | from urllib import quote_plus | 23 | from urllib import quote_plus |
785 | 24 | import simplejson as json | 24 | import simplejson as json |
786 | 25 | import argparse | 25 | import argparse |
787 | 26 | import copy | ||
788 | 26 | import os | 27 | import os |
789 | 27 | import sys | 28 | import sys |
790 | 28 | stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir ) | 29 | stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir ) |
791 | 29 | sys.path.insert(0, stk_path) | 30 | sys.path.insert(0, stk_path) |
792 | 30 | import supertree_toolkit as stk | 31 | import supertree_toolkit as stk |
793 | 31 | import csv | 32 | import csv |
803 | 32 | 33 | from ete2 import Tree | |
804 | 33 | # What we get from EOL | 34 | import tempfile |
805 | 34 | current_taxonomy_levels = ['species','genus','family','order','class','phylum','kingdom'] | 35 | import re |
806 | 35 | # And the extra ones from ITIS | 36 | |
807 | 36 | extra_taxonomy_levels = ['superfamily','infraorder','suborder','superorder','subclass','subphylum','superphylum','infrakingdom','subkingdom'] | 37 | taxonomy_levels = stk.taxonomy_levels |
808 | 37 | # all of them in order | 38 | #tlevels = ['species','genus','family','superfamily','suborder','order','class','phylum','kingdom'] |
809 | 38 | taxonomy_levels = ['species','genus','subfamily','family','tribe','superfamily','infraorder','suborder','order','superorder','subclass','class','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] | 39 | tlevels = ['species','genus', 'subfamily', 'family','infraorder','order','class','phylum','kingdom'] |
810 | 39 | 40 | ||
811 | 40 | def get_tree_taxa_taxonomy(taxon,wsdlObjectWoRMS): | 41 | def get_tree_taxa_taxonomy_eol(taxon): |
812 | 42 | |||
813 | 43 | taxonq = quote_plus(taxon) | ||
814 | 44 | URL = "http://eol.org/api/search/1.0.json?q="+taxonq | ||
815 | 45 | req = urllib2.Request(URL) | ||
816 | 46 | opener = urllib2.build_opener() | ||
817 | 47 | f = opener.open(req) | ||
818 | 48 | data = json.load(f) | ||
819 | 49 | |||
820 | 50 | if data['results'] == []: | ||
821 | 51 | return {} | ||
822 | 52 | ID = str(data['results'][0]['id']) # take first hit | ||
823 | 53 | # Now look for taxonomies | ||
824 | 54 | URL = "http://eol.org/api/pages/1.0/"+ID+".json" | ||
825 | 55 | req = urllib2.Request(URL) | ||
826 | 56 | opener = urllib2.build_opener() | ||
827 | 57 | f = opener.open(req) | ||
828 | 58 | data = json.load(f) | ||
829 | 59 | if len(data['taxonConcepts']) == 0: | ||
830 | 60 | return {} | ||
831 | 61 | TID = str(data['taxonConcepts'][0]['identifier']) # take first hit | ||
832 | 62 | currentdb = str(data['taxonConcepts'][0]['nameAccordingTo']) | ||
833 | 63 | # loop through and get preferred one if specified | ||
834 | 64 | # now get taxonomy | ||
835 | 65 | for db in data['taxonConcepts']: | ||
836 | 66 | currentdb = db['nameAccordingTo'].lower() | ||
837 | 67 | TID = str(db['identifier']) | ||
838 | 68 | break | ||
839 | 69 | URL="http://eol.org/api/hierarchy_entries/1.0/"+TID+".json" | ||
840 | 70 | req = urllib2.Request(URL) | ||
841 | 71 | opener = urllib2.build_opener() | ||
842 | 72 | f = opener.open(req) | ||
843 | 73 | data = json.load(f) | ||
844 | 74 | tax_array = {} | ||
845 | 75 | tax_array['provider'] = currentdb | ||
846 | 76 | for a in data['ancestors']: | ||
847 | 77 | try: | ||
848 | 78 | if a.has_key('taxonRank') : | ||
849 | 79 | temp_level = a['taxonRank'].encode("ascii","ignore") | ||
850 | 80 | if (temp_level in taxonomy_levels): | ||
851 | 81 | # note the dump into ASCII | ||
852 | 82 | temp_name = a['scientificName'].encode("ascii","ignore") | ||
853 | 83 | temp_name = temp_name.split(" ") | ||
854 | 84 | if (temp_level == 'species'): | ||
855 | 85 | tax_array[temp_level] = "_".join(temp_name[0:2]) | ||
856 | 86 | |||
857 | 87 | else: | ||
858 | 88 | tax_array[temp_level] = temp_name[0] | ||
859 | 89 | except KeyError as e: | ||
860 | 90 | logging.exception("Key not found: taxonRank") | ||
861 | 91 | continue | ||
862 | 92 | try: | ||
863 | 93 | # add taxonomy in to the taxonomy! | ||
864 | 94 | # some issues here, so let's make sure it's OK | ||
865 | 95 | temp_name = taxon.split(" ") | ||
866 | 96 | if data.has_key('taxonRank') : | ||
867 | 97 | if not data['taxonRank'].lower() == 'species': | ||
868 | 98 | tax_array[data['taxonRank'].lower()] = temp_name[0] | ||
869 | 99 | else: | ||
870 | 100 | tax_array[data['taxonRank'].lower()] = ' '.join(temp_name[0:2]) | ||
871 | 101 | except KeyError as e: | ||
872 | 102 | return tax_array | ||
873 | 103 | |||
874 | 104 | return tax_array | ||
875 | 105 | |||
876 | 106 | def get_tree_taxa_taxonomy_worms(taxon): | ||
877 | 107 | |||
878 | 108 | from SOAPpy import WSDL | ||
879 | 109 | wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1') | ||
880 | 41 | 110 | ||
881 | 42 | taxon_data = wsdlObjectWoRMS.getAphiaRecords(taxon.replace('_',' ')) | 111 | taxon_data = wsdlObjectWoRMS.getAphiaRecords(taxon.replace('_',' ')) |
882 | 43 | if taxon_data == None: | 112 | if taxon_data == None: |
883 | @@ -51,6 +120,8 @@ | |||
884 | 51 | classification = wsdlObjectWoRMS.getAphiaClassificationByID(taxon_id) | 120 | classification = wsdlObjectWoRMS.getAphiaClassificationByID(taxon_id) |
885 | 52 | # construct array | 121 | # construct array |
886 | 53 | tax_array = {} | 122 | tax_array = {} |
887 | 123 | if (classification == ""): | ||
888 | 124 | return {} | ||
889 | 54 | # classification is a nested dictionary, so we need to iterate down it | 125 | # classification is a nested dictionary, so we need to iterate down it |
890 | 55 | current_child = classification.child | 126 | current_child = classification.child |
891 | 56 | while True: | 127 | while True: |
892 | @@ -60,27 +131,252 @@ | |||
893 | 60 | break | 131 | break |
894 | 61 | return tax_array | 132 | return tax_array |
895 | 62 | 133 | ||
899 | 63 | 134 | def get_tree_taxa_taxonomy_itis(taxon): | |
900 | 64 | 135 | ||
901 | 65 | def get_taxonomy_worms(taxonomy, start_otu): | 136 | URL="http://www.itis.gov/ITISWebService/jsonservice/searchByScientificName?srchKey="+quote_plus(taxon.replace('_',' ').strip()) |
902 | 137 | req = urllib2.Request(URL) | ||
903 | 138 | opener = urllib2.build_opener() | ||
904 | 139 | f = opener.open(req) | ||
905 | 140 | string = unicode(f.read(),"ISO-8859-1") | ||
906 | 141 | this_item = json.loads(string) | ||
907 | 142 | if this_item['scientificNames'] == [None]: # not found | ||
908 | 143 | return {} | ||
909 | 144 | tsn = this_item['scientificNames'][0]['tsn'] # there might be records that aren't valid - they point to the valid one though | ||
910 | 145 | # so call another function to get any valid names | ||
911 | 146 | URL="http://www.itis.gov/ITISWebService/jsonservice/getAcceptedNamesFromTSN?tsn="+tsn | ||
912 | 147 | req = urllib2.Request(URL) | ||
913 | 148 | opener = urllib2.build_opener() | ||
914 | 149 | f = opener.open(req) | ||
915 | 150 | string = unicode(f.read(),"ISO-8859-1") | ||
916 | 151 | this_item = json.loads(string) | ||
917 | 152 | if not this_item['acceptedNames'] == [None]: | ||
918 | 153 | tsn = this_item['acceptedNames'][0]['acceptedTsn'] | ||
919 | 154 | |||
920 | 155 | URL="http://www.itis.gov/ITISWebService/jsonservice/getFullHierarchyFromTSN?tsn="+str(tsn) | ||
921 | 156 | req = urllib2.Request(URL) | ||
922 | 157 | opener = urllib2.build_opener() | ||
923 | 158 | f = opener.open(req) | ||
924 | 159 | string = unicode(f.read(),"ISO-8859-1") | ||
925 | 160 | data = json.loads(string) | ||
926 | 161 | # construct array | ||
927 | 162 | this_taxonomy = {} | ||
928 | 163 | for level in data['hierarchyList']: | ||
929 | 164 | if level['rankName'].lower() in taxonomy_levels: | ||
930 | 165 | # note the dump into ASCII | ||
931 | 166 | this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = level['taxonName'].encode("ascii","ignore") | ||
932 | 167 | |||
933 | 168 | return this_taxonomy | ||
934 | 169 | |||
935 | 170 | |||
936 | 171 | |||
937 | 172 | def get_taxonomy_eol(taxonomy, start_otu, verbose,tmpfile=None,skip=False): | ||
938 | 173 | |||
939 | 174 | # this is the recursive function | ||
940 | 175 | def get_children(taxonomy, ID, aphiaIDsDone): | ||
941 | 176 | |||
942 | 177 | # get data | ||
943 | 178 | URL="http://eol.org/api/hierarchy_entries/1.0/"+str(ID)+".json?common_names=false&synonyms=false&cache_ttl=" | ||
944 | 179 | req = urllib2.Request(URL) | ||
945 | 180 | opener = urllib2.build_opener() | ||
946 | 181 | f = opener.open(req) | ||
947 | 182 | string = unicode(f.read(),"ISO-8859-1") | ||
948 | 183 | this_item = json.loads(string) | ||
949 | 184 | if this_item == None: | ||
950 | 185 | return taxonomy | ||
951 | 186 | if this_item['taxonRank'].lower().strip() == 'species': | ||
952 | 187 | # add data to taxonomy dictionary | ||
953 | 188 | taxon = this_item['scientificName'].split()[0:2] # just the first two words | ||
954 | 189 | taxon = " ".join(taxon[0:2]) | ||
955 | 190 | # NOTE following line means existing items are *not* updated | ||
956 | 191 | if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy | ||
957 | 192 | this_taxonomy = {} | ||
958 | 193 | for level in this_item['ancestors']: | ||
959 | 194 | if level['taxonRank'].lower() in taxonomy_levels: | ||
960 | 195 | # note the dump into ASCII | ||
961 | 196 | this_taxonomy[level['taxonRank'].lower().encode("ascii","ignore")] = level['scientificName'].encode("ascii","ignore") | ||
962 | 197 | # add species: | ||
963 | 198 | this_taxonomy['species'] = taxon.replace(" ","_") | ||
964 | 199 | if verbose: | ||
965 | 200 | print "\tAdding "+taxon | ||
966 | 201 | taxonomy[taxon] = this_taxonomy | ||
967 | 202 | if not tmpfile == None: | ||
968 | 203 | stk.save_taxonomy(taxonomy,tmpfile) | ||
969 | 204 | return taxonomy | ||
970 | 205 | else: | ||
971 | 206 | return taxonomy | ||
972 | 207 | all_children = [] | ||
973 | 208 | for level in this_item['children']: | ||
974 | 209 | if not level == None: | ||
975 | 210 | all_children.append(level['taxonID']) | ||
976 | 211 | |||
977 | 212 | if (len(all_children) == 0): | ||
978 | 213 | return taxonomy | ||
979 | 214 | |||
980 | 215 | for child in all_children: | ||
981 | 216 | if child in aphiaIDsDone: # we get stuck sometime | ||
982 | 217 | continue | ||
983 | 218 | aphiaIDsDone.append(child) | ||
984 | 219 | taxonomy = get_children(taxonomy, child, aphiaIDsDone) | ||
985 | 220 | return taxonomy | ||
986 | 221 | |||
987 | 222 | |||
988 | 223 | # main bit of the get_taxonomy_eol function | ||
989 | 224 | taxonq = quote_plus(start_otu) | ||
990 | 225 | URL = "http://eol.org/api/search/1.0.json?q="+taxonq | ||
991 | 226 | req = urllib2.Request(URL) | ||
992 | 227 | opener = urllib2.build_opener() | ||
993 | 228 | f = opener.open(req) | ||
994 | 229 | data = json.load(f) | ||
995 | 230 | start_id = str(data['results'][0]['id']) # this is the page ID. We get the species ID next | ||
996 | 231 | URL = "http://eol.org/api/pages/1.0/"+start_id+".json" | ||
997 | 232 | req = urllib2.Request(URL) | ||
998 | 233 | opener = urllib2.build_opener() | ||
999 | 234 | f = opener.open(req) | ||
1000 | 235 | data = json.load(f) | ||
1001 | 236 | if len(data['taxonConcepts']) == 0: | ||
1002 | 237 | print "Error finding you start taxa. Spelling?" | ||
1003 | 238 | return None | ||
1004 | 239 | start_id = data['taxonConcepts'][0]['identifier'] | ||
1005 | 240 | start_taxonomy_level = data['taxonConcepts'][0]['taxonRank'].lower() | ||
1006 | 241 | |||
1007 | 242 | aphiaIDsDone = [] | ||
1008 | 243 | if not skip: | ||
1009 | 244 | taxonomy = get_children(taxonomy,start_id,aphiaIDsDone) | ||
1010 | 245 | |||
1011 | 246 | return taxonomy, start_taxonomy_level | ||
1012 | 247 | |||
1013 | 248 | |||
1014 | 249 | |||
1015 | 250 | def get_taxonomy_itis(taxonomy, start_otu, verbose,tmpfile=None,skip=False): | ||
1016 | 251 | import simplejson as json | ||
1017 | 252 | |||
1018 | 253 | # this is the recursive function | ||
1019 | 254 | def get_children(taxonomy, ID, aphiaIDsDone): | ||
1020 | 255 | |||
1021 | 256 | # get data | ||
1022 | 257 | URL="http://www.itis.gov/ITISWebService/jsonservice/getFullRecordFromTSN?tsn="+ID | ||
1023 | 258 | req = urllib2.Request(URL) | ||
1024 | 259 | opener = urllib2.build_opener() | ||
1025 | 260 | f = opener.open(req) | ||
1026 | 261 | string = unicode(f.read(),"ISO-8859-1") | ||
1027 | 262 | this_item = json.loads(string) | ||
1028 | 263 | if this_item == None: | ||
1029 | 264 | return taxonomy | ||
1030 | 265 | if not this_item['usage']['taxonUsageRating'].lower() == 'valid': | ||
1031 | 266 | print "rejecting " , this_item['scientificName']['combinedName'] | ||
1032 | 267 | return taxonomy | ||
1033 | 268 | if this_item['taxRank']['rankName'].lower().strip() == 'species': | ||
1034 | 269 | # add data to taxonomy dictionary | ||
1035 | 270 | taxon = this_item['scientificName']['combinedName'] | ||
1036 | 271 | # NOTE following line means existing items are *not* updated | ||
1037 | 272 | if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy | ||
1038 | 273 | # get the taxonomy of this species | ||
1039 | 274 | tsn = this_item["scientificName"]["tsn"] | ||
1040 | 275 | URL="http://www.itis.gov/ITISWebService/jsonservice/getFullHierarchyFromTSN?tsn="+tsn | ||
1041 | 276 | req = urllib2.Request(URL) | ||
1042 | 277 | opener = urllib2.build_opener() | ||
1043 | 278 | f = opener.open(req) | ||
1044 | 279 | string = unicode(f.read(),"ISO-8859-1") | ||
1045 | 280 | data = json.loads(string) | ||
1046 | 281 | this_taxonomy = {} | ||
1047 | 282 | for level in data['hierarchyList']: | ||
1048 | 283 | if level['rankName'].lower() in taxonomy_levels: | ||
1049 | 284 | # note the dump into ASCII | ||
1050 | 285 | this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = level['taxonName'].encode("ascii","ignore") | ||
1051 | 286 | if verbose: | ||
1052 | 287 | print "\tAdding "+taxon | ||
1053 | 288 | taxonomy[taxon] = this_taxonomy | ||
1054 | 289 | if not tmpfile == None: | ||
1055 | 290 | stk.save_taxonomy(taxonomy,tmpfile) | ||
1056 | 291 | return taxonomy | ||
1057 | 292 | else: | ||
1058 | 293 | return taxonomy | ||
1059 | 294 | |||
1060 | 295 | all_children = [] | ||
1061 | 296 | URL="http://www.itis.gov/ITISWebService/jsonservice/getHierarchyDownFromTSN?tsn="+ID | ||
1062 | 297 | req = urllib2.Request(URL) | ||
1063 | 298 | opener = urllib2.build_opener() | ||
1064 | 299 | f = opener.open(req) | ||
1065 | 300 | string = unicode(f.read(),"ISO-8859-1") | ||
1066 | 301 | this_item = json.loads(string) | ||
1067 | 302 | if this_item == None: | ||
1068 | 303 | return taxonomy | ||
1069 | 304 | |||
1070 | 305 | for level in this_item['hierarchyList']: | ||
1071 | 306 | if not level == None: | ||
1072 | 307 | all_children.append(level['tsn']) | ||
1073 | 308 | |||
1074 | 309 | if (len(all_children) == 0): | ||
1075 | 310 | return taxonomy | ||
1076 | 311 | |||
1077 | 312 | for child in all_children: | ||
1078 | 313 | if child in aphiaIDsDone: # we get stuck sometime | ||
1079 | 314 | continue | ||
1080 | 315 | aphiaIDsDone.append(child) | ||
1081 | 316 | taxonomy = get_children(taxonomy, child, aphiaIDsDone) | ||
1082 | 317 | |||
1083 | 318 | return taxonomy | ||
1084 | 319 | |||
1085 | 320 | |||
1086 | 321 | # main bit of the get_taxonomy_worms function | ||
1087 | 322 | URL="http://www.itis.gov/ITISWebService/jsonservice/searchByScientificName?srchKey="+quote_plus(start_otu.strip()) | ||
1088 | 323 | req = urllib2.Request(URL) | ||
1089 | 324 | opener = urllib2.build_opener() | ||
1090 | 325 | f = opener.open(req) | ||
1091 | 326 | string = unicode(f.read(),"ISO-8859-1") | ||
1092 | 327 | this_item = json.loads(string) | ||
1093 | 328 | start_id = this_item['scientificNames'][0]['tsn'] # there might be records that aren't valid - they point to the valid one though | ||
1094 | 329 | # call it again via the ID this time to make sure we've got the right one. | ||
1095 | 330 | # so call another function to get any valid names | ||
1096 | 331 | URL="http://www.itis.gov/ITISWebService/jsonservice/getAcceptedNamesFromTSN?tsn="+start_id | ||
1097 | 332 | req = urllib2.Request(URL) | ||
1098 | 333 | opener = urllib2.build_opener() | ||
1099 | 334 | f = opener.open(req) | ||
1100 | 335 | string = unicode(f.read(),"ISO-8859-1") | ||
1101 | 336 | this_item = json.loads(string) | ||
1102 | 337 | if not this_item['acceptedNames'] == [None]: | ||
1103 | 338 | start_id = this_item['acceptedNames'][0]['acceptedTsn'] | ||
1104 | 339 | |||
1105 | 340 | URL="http://www.itis.gov/ITISWebService/jsonservice/getFullRecordFromTSN?tsn="+start_id | ||
1106 | 341 | req = urllib2.Request(URL) | ||
1107 | 342 | opener = urllib2.build_opener() | ||
1108 | 343 | f = opener.open(req) | ||
1109 | 344 | string = unicode(f.read(),"ISO-8859-1") | ||
1110 | 345 | this_item = json.loads(string) | ||
1111 | 346 | start_taxonomy_level = this_item['taxRank']['rankName'].lower() | ||
1112 | 347 | |||
1113 | 348 | aphiaIDsDone = [] | ||
1114 | 349 | if not skip: | ||
1115 | 350 | taxonomy = get_children(taxonomy,start_id,aphiaIDsDone) | ||
1116 | 351 | |||
1117 | 352 | return taxonomy, start_taxonomy_level | ||
1118 | 353 | |||
1119 | 354 | |||
1120 | 355 | |||
1121 | 356 | |||
1122 | 357 | def get_taxonomy_worms(taxonomy, start_otu, verbose,tmpfile=None,skip=False): | ||
1123 | 66 | """ Gets and processes a taxon from the queue to get its taxonomy.""" | 358 | """ Gets and processes a taxon from the queue to get its taxonomy.""" |
1124 | 67 | from SOAPpy import WSDL | 359 | from SOAPpy import WSDL |
1125 | 68 | 360 | ||
1126 | 69 | wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1') | 361 | wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1') |
1127 | 70 | 362 | ||
1128 | 71 | # this is the recursive function | 363 | # this is the recursive function |
1130 | 72 | def get_children(taxonomy, ID): | 364 | def get_children(taxonomy, ID, aphiaIDsDone): |
1131 | 73 | 365 | ||
1132 | 74 | # get data | 366 | # get data |
1133 | 75 | this_item = wsdlObjectWoRMS.getAphiaRecordByID(ID) | 367 | this_item = wsdlObjectWoRMS.getAphiaRecordByID(ID) |
1134 | 76 | if this_item == None: | 368 | if this_item == None: |
1135 | 77 | return taxonomy | 369 | return taxonomy |
1136 | 370 | if not this_item['status'].lower() == 'accepted': | ||
1137 | 371 | print "rejecting " , this_item.valid_name | ||
1138 | 372 | return taxonomy | ||
1139 | 78 | if this_item['rank'].lower() == 'species': | 373 | if this_item['rank'].lower() == 'species': |
1140 | 79 | # add data to taxonomy dictionary | 374 | # add data to taxonomy dictionary |
1144 | 80 | # get the taxonomy of this species | 375 | taxon = this_item.valid_name |
1145 | 81 | classification = wsdlObjectWoRMS.getAphiaClassificationByID(ID) | 376 | # NOTE following line means existing items are *not* updated |
1143 | 82 | taxon = this_item.scientificname | ||
1146 | 83 | if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy | 377 | if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy |
1147 | 378 | # get the taxonomy of this species | ||
1148 | 379 | classification = wsdlObjectWoRMS.getAphiaClassificationByID(ID) | ||
1149 | 84 | # construct array | 380 | # construct array |
1150 | 85 | tax_array = {} | 381 | tax_array = {} |
1151 | 86 | # classification is a nested dictionary, so we need to iterate down it | 382 | # classification is a nested dictionary, so we need to iterate down it |
1152 | @@ -92,16 +388,36 @@ | |||
1153 | 92 | current_child = current_child.child | 388 | current_child = current_child.child |
1154 | 93 | if current_child == '': # empty one is a string for some reason | 389 | if current_child == '': # empty one is a string for some reason |
1155 | 94 | break | 390 | break |
1157 | 95 | taxonomy[this_item.scientificname] = tax_array | 391 | if verbose: |
1158 | 392 | print "\tAdding "+this_item.scientificname | ||
1159 | 393 | taxonomy[this_item.valid_name] = tax_array | ||
1160 | 394 | if not tmpfile == None: | ||
1161 | 395 | stk.save_taxonomy(taxonomy,tmpfile) | ||
1162 | 96 | return taxonomy | 396 | return taxonomy |
1163 | 97 | else: | 397 | else: |
1164 | 98 | return taxonomy | 398 | return taxonomy |
1165 | 99 | 399 | ||
1171 | 100 | children = wsdlObjectWoRMS.getAphiaChildrenByID(ID, 1, False) | 400 | all_children = [] |
1172 | 101 | 401 | start = 1 | |
1173 | 102 | for child in children: | 402 | while True: |
1174 | 103 | taxonomy = get_children(taxonomy, child['valid_AphiaID']) | 403 | children = wsdlObjectWoRMS.getAphiaChildrenByID(ID, start, False) |
1175 | 104 | 404 | if (children is None or children == None): | |
1176 | 405 | break | ||
1177 | 406 | if (len(children) < 50): | ||
1178 | 407 | all_children.extend(children) | ||
1179 | 408 | break | ||
1180 | 409 | all_children.extend(children) | ||
1181 | 410 | start += 50 | ||
1182 | 411 | |||
1183 | 412 | if (len(all_children) == 0): | ||
1184 | 413 | return taxonomy | ||
1185 | 414 | |||
1186 | 415 | for child in all_children: | ||
1187 | 416 | if child['valid_AphiaID'] in aphiaIDsDone: # we get stuck sometime | ||
1188 | 417 | continue | ||
1189 | 418 | aphiaIDsDone.append(child['valid_AphiaID']) | ||
1190 | 419 | taxonomy = get_children(taxonomy, child['valid_AphiaID'], aphiaIDsDone) | ||
1191 | 420 | |||
1192 | 105 | return taxonomy | 421 | return taxonomy |
1193 | 106 | 422 | ||
1194 | 107 | 423 | ||
1195 | @@ -111,12 +427,17 @@ | |||
1196 | 111 | start_id = start_taxa[0]['valid_AphiaID'] # there might be records that aren't valid - they point to the valid one though | 427 | start_id = start_taxa[0]['valid_AphiaID'] # there might be records that aren't valid - they point to the valid one though |
1197 | 112 | # call it again via the ID this time to make sure we've got the right one. | 428 | # call it again via the ID this time to make sure we've got the right one. |
1198 | 113 | start_taxa = wsdlObjectWoRMS.getAphiaRecordByID(start_id) | 429 | start_taxa = wsdlObjectWoRMS.getAphiaRecordByID(start_id) |
1202 | 114 | start_taxonomy_level = start_taxa['rank'].lower() | 430 | if start_taxa == None: |
1203 | 115 | except HTTPError: | 431 | start_taxonomy_level = 'infraorder' |
1204 | 116 | print "Error" | 432 | else: |
1205 | 433 | start_taxonomy_level = start_taxa['rank'].lower() | ||
1206 | 434 | except urllib2.HTTPError: | ||
1207 | 435 | print "Error finding start_otu taxonomic level. Do you have an internet connection?" | ||
1208 | 117 | sys.exit(-1) | 436 | sys.exit(-1) |
1209 | 118 | 437 | ||
1211 | 119 | taxonomy = get_children(taxonomy,start_id) | 438 | aphiaIDsDone = [] |
1212 | 439 | if not skip: | ||
1213 | 440 | taxonomy = get_children(taxonomy,start_id,aphiaIDsDone) | ||
1214 | 120 | 441 | ||
1215 | 121 | return taxonomy, start_taxonomy_level | 442 | return taxonomy, start_taxonomy_level |
1216 | 122 | 443 | ||
1217 | @@ -136,9 +457,16 @@ | |||
1218 | 136 | default=False | 457 | default=False |
1219 | 137 | ) | 458 | ) |
1220 | 138 | parser.add_argument( | 459 | parser.add_argument( |
1221 | 460 | '-s', | ||
1222 | 461 | '--skip', | ||
1223 | 462 | action='store_true', | ||
1224 | 463 | help="Skip online checking, just use taxonomy files", | ||
1225 | 464 | default=False | ||
1226 | 465 | ) | ||
1227 | 466 | parser.add_argument( | ||
1228 | 139 | '--pref_db', | 467 | '--pref_db', |
1229 | 140 | help="Taxonomy database to use. Default is Species 2000/ITIS", | 468 | help="Taxonomy database to use. Default is Species 2000/ITIS", |
1231 | 141 | choices=['itis', 'worms', 'ncbi'], | 469 | choices=['itis', 'worms', 'ncbi', 'eol'], |
1232 | 142 | default = 'worms' | 470 | default = 'worms' |
1233 | 143 | ) | 471 | ) |
1234 | 144 | parser.add_argument( | 472 | parser.add_argument( |
1235 | @@ -178,58 +506,250 @@ | |||
1236 | 178 | top_level = args.top_level[0] | 506 | top_level = args.top_level[0] |
1237 | 179 | save_taxonomy_file = args.save_taxonomy | 507 | save_taxonomy_file = args.save_taxonomy |
1238 | 180 | tree_taxonomy = args.tree_taxonomy | 508 | tree_taxonomy = args.tree_taxonomy |
1239 | 509 | taxonomy = args.taxonomy_from_file | ||
1240 | 181 | pref_db = args.pref_db | 510 | pref_db = args.pref_db |
1241 | 511 | skip = args.skip | ||
1242 | 182 | if (save_taxonomy_file == None): | 512 | if (save_taxonomy_file == None): |
1243 | 183 | save_taxonomy = False | 513 | save_taxonomy = False |
1244 | 184 | else: | 514 | else: |
1245 | 185 | save_taxonomy = True | 515 | save_taxonomy = True |
1246 | 516 | load_tree_taxonomy = False | ||
1247 | 517 | if (not tree_taxonomy == None): | ||
1248 | 518 | tree_taxonomy_file = tree_taxonomy | ||
1249 | 519 | load_tree_taxonomy = True | ||
1250 | 520 | if skip: | ||
1251 | 521 | if taxonomy == None: | ||
1252 | 522 | print "Error: If you're skipping checking online, then you need to supply taxonomy files" | ||
1253 | 523 | return | ||
1254 | 186 | 524 | ||
1255 | 187 | # grab taxa in tree | 525 | # grab taxa in tree |
1256 | 188 | tree = stk.import_tree(input_file) | 526 | tree = stk.import_tree(input_file) |
1257 | 189 | taxa_list = stk._getTaxaFromNewick(tree) | 527 | taxa_list = stk._getTaxaFromNewick(tree) |
1262 | 190 | 528 | ||
1263 | 191 | taxonomy = {} | 529 | if verbose: |
1264 | 192 | 530 | print "Taxa count for input tree: ", len(taxa_list) | |
1265 | 193 | # we're going to add the taxa in the tree to the taxonomy, to stop them | 531 | |
1266 | 532 | # load in any taxonomy files - we still call the APIs as a) they may have updated data and | ||
1267 | 533 | # b) the user may have missed some first time round (i.e. expanded the tree and not redone | ||
1268 | 534 | # the taxonomy | ||
1269 | 535 | if (taxonomy == None): | ||
1270 | 536 | taxonomy = {} | ||
1271 | 537 | else: | ||
1272 | 538 | taxonomy = stk.load_taxonomy(taxonomy) | ||
1273 | 539 | tree_taxonomy = {} | ||
1274 | 540 | # this might also have tree_taxonomy in too - let's check this | ||
1275 | 541 | for t in taxa_list: | ||
1276 | 542 | if t in taxonomy: | ||
1277 | 543 | tree_taxonomy[t] = taxonomy[t] | ||
1278 | 544 | elif t.replace("_"," ") in taxonomy: | ||
1279 | 545 | tree_taxonomy[t] = taxonomy[t.replace("_"," ")] | ||
1280 | 546 | |||
1281 | 547 | if (load_tree_taxonomy): # overwrite the good work above... | ||
1282 | 548 | tree_taxonomy = stk.load_taxonomy(tree_taxonomy_file) | ||
1283 | 549 | if (tree_taxonomy == None): | ||
1284 | 550 | tree_taxonomy = {} | ||
1285 | 551 | |||
1286 | 552 | # we're going to add the taxa in the tree to the main WORMS taxonomy, to stop them | ||
1287 | 194 | # being fetched in first place. We delete them later | 553 | # being fetched in first place. We delete them later |
1288 | 554 | # If you've loaded a taxonomy created by this script, this overwrites the tree taxa in the main taxonomy dict | ||
1289 | 555 | # Don't worry, we put them back in before saving again! | ||
1290 | 195 | for taxon in taxa_list: | 556 | for taxon in taxa_list: |
1291 | 196 | taxon = taxon.replace('_',' ') | 557 | taxon = taxon.replace('_',' ') |
1294 | 197 | taxonomy[taxon] = [] | 558 | taxonomy[taxon] = {} |
1293 | 198 | |||
1295 | 199 | 559 | ||
1296 | 200 | if (pref_db == 'itis'): | 560 | if (pref_db == 'itis'): |
1297 | 201 | # get taxonomy info from itis | 561 | # get taxonomy info from itis |
1300 | 202 | print "Sorry, ITIS is not implemented yet" | 562 | if (verbose): |
1301 | 203 | pass | 563 | print "Getting data from ITIS" |
1302 | 564 | if (verbose): | ||
1303 | 565 | print "Dealing with taxa in tree" | ||
1304 | 566 | for t in taxa_list: | ||
1305 | 567 | if verbose: | ||
1306 | 568 | print "\t"+t | ||
1307 | 569 | if not(t in tree_taxonomy or t.replace("_"," ") in tree_taxonomy): | ||
1308 | 570 | # we don't have data - NOTE we assume things are *not* updated here if we do | ||
1309 | 571 | tree_taxonomy[t] = get_tree_taxa_taxonomy_itis(t) | ||
1310 | 572 | |||
1311 | 573 | if save_taxonomy: | ||
1312 | 574 | if (verbose): | ||
1313 | 575 | print "Saving tree taxonomy" | ||
1314 | 576 | # note -temporary save as we overwrite this file later. | ||
1315 | 577 | stk.save_taxonomy(tree_taxonomy,save_taxonomy_file+'_tree.csv') | ||
1316 | 578 | |||
1317 | 579 | # get taxonomy from worms | ||
1318 | 580 | if verbose: | ||
1319 | 581 | print "Now dealing with all other taxa - this might take a while..." | ||
1320 | 582 | # create a temp file so we can checkpoint and continue | ||
1321 | 583 | tmpf, tmpfile = tempfile.mkstemp() | ||
1322 | 584 | |||
1323 | 585 | if os.path.isfile('.fit_lock'): | ||
1324 | 586 | f = open('.fit_lock','r') | ||
1325 | 587 | tf = f.read() | ||
1326 | 588 | f.close() | ||
1327 | 589 | if os.path.isfile(tf.strip()): | ||
1328 | 590 | taxonomy = stk.load_taxonomy(tf.strip()) | ||
1329 | 591 | os.remove('.fit_lock') | ||
1330 | 592 | |||
1331 | 593 | # create lock file - if this is here, then we load from the file in the lock file (or try to) and continue | ||
1332 | 594 | # where we left off. | ||
1333 | 595 | with open(".fit_lock", 'w') as f: | ||
1334 | 596 | f.write(tmpfile) | ||
1335 | 597 | # bit naughty with tmpfile - we're using the filename rather than handle to write to it. Have to for write_taxonomy function | ||
1336 | 598 | taxonomy, start_level = get_taxonomy_itis(taxonomy,top_level,verbose,tmpfile=tmpfile,skip=skip) # this skips ones already there | ||
1337 | 599 | |||
1338 | 600 | # clean up | ||
1339 | 601 | os.close(tmpf) | ||
1340 | 602 | os.remove('.fit_lock') | ||
1341 | 603 | try: | ||
1342 | 604 | os.remove('tmpfile') | ||
1343 | 605 | except OSError: | ||
1344 | 606 | pass | ||
1345 | 204 | elif (pref_db == 'worms'): | 607 | elif (pref_db == 'worms'): |
1346 | 608 | if (verbose): | ||
1347 | 609 | print "Getting data from WoRMS" | ||
1348 | 205 | # get tree taxonomy from worms | 610 | # get tree taxonomy from worms |
1357 | 206 | if (tree_taxonomy == None): | 611 | if (verbose): |
1358 | 207 | tree_taxonomy = {} | 612 | print "Dealing with taxa in tree" |
1359 | 208 | for t in taxa_list: | 613 | |
1360 | 209 | from SOAPpy import WSDL | 614 | for t in taxa_list: |
1361 | 210 | wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1') | 615 | if verbose: |
1362 | 211 | tree_taxonomy[t] = get_tree_taxa_taxonomy(t,wsdlObjectWoRMS) | 616 | print "\t"+t |
1363 | 212 | else: | 617 | if not(t in tree_taxonomy or t.replace("_"," ") in tree_taxonomy): |
1364 | 213 | tree_taxonomy = stk.load_taxonomy(tree_taxonomy) | 618 | # we don't have data - NOTE we assume things are *not* updated here if we do |
1365 | 619 | tree_taxonomy[t] = get_tree_taxa_taxonomy_worms(t) | ||
1366 | 620 | |||
1367 | 621 | if save_taxonomy: | ||
1368 | 622 | if (verbose): | ||
1369 | 623 | print "Saving tree taxonomy" | ||
1370 | 624 | # note -temporary save as we overwrite this file later. | ||
1371 | 625 | stk.save_taxonomy(tree_taxonomy,save_taxonomy_file+'_tree.csv') | ||
1372 | 626 | |||
1373 | 214 | # get taxonomy from worms | 627 | # get taxonomy from worms |
1375 | 215 | taxonomy, start_level = get_taxonomy_worms(taxonomy,top_level) | 628 | if verbose: |
1376 | 629 | print "Now dealing with all other taxa - this might take a while..." | ||
1377 | 630 | # create a temp file so we can checkpoint and continue | ||
1378 | 631 | tmpf, tmpfile = tempfile.mkstemp() | ||
1379 | 632 | |||
1380 | 633 | if os.path.isfile('.fit_lock'): | ||
1381 | 634 | f = open('.fit_lock','r') | ||
1382 | 635 | tf = f.read() | ||
1383 | 636 | f.close() | ||
1384 | 637 | if os.path.isfile(tf.strip()): | ||
1385 | 638 | taxonomy = stk.load_taxonomy(tf.strip()) | ||
1386 | 639 | os.remove('.fit_lock') | ||
1387 | 640 | |||
1388 | 641 | # create lock file - if this is here, then we load from the file in the lock file (or try to) and continue | ||
1389 | 642 | # where we left off. | ||
1390 | 643 | with open(".fit_lock", 'w') as f: | ||
1391 | 644 | f.write(tmpfile) | ||
1392 | 645 | # bit naughty with tmpfile - we're using the filename rather than handle to write to it. Have to for write_taxonomy function | ||
1393 | 646 | taxonomy, start_level = get_taxonomy_worms(taxonomy,top_level,verbose,tmpfile=tmpfile,skip=skip) # this skips ones already there | ||
1394 | 647 | |||
1395 | 648 | # clean up | ||
1396 | 649 | os.close(tmpf) | ||
1397 | 650 | os.remove('.fit_lock') | ||
1398 | 651 | try: | ||
1399 | 652 | os.remove('tmpfile') | ||
1400 | 653 | except OSError: | ||
1401 | 654 | pass | ||
1402 | 216 | 655 | ||
1403 | 217 | elif (pref_db == 'ncbi'): | 656 | elif (pref_db == 'ncbi'): |
1404 | 218 | # get taxonomy from ncbi | 657 | # get taxonomy from ncbi |
1405 | 219 | print "Sorry, NCBI is not implemented yet" | 658 | print "Sorry, NCBI is not implemented yet" |
1406 | 220 | pass | 659 | pass |
1407 | 660 | elif (pref_db == 'eol'): | ||
1408 | 661 | if (verbose): | ||
1409 | 662 | print "Getting data from EOL" | ||
1410 | 663 | # get tree taxonomy from worms | ||
1411 | 664 | if (verbose): | ||
1412 | 665 | print "Dealing with taxa in tree" | ||
1413 | 666 | for t in taxa_list: | ||
1414 | 667 | if verbose: | ||
1415 | 668 | print "\t"+t | ||
1416 | 669 | try: | ||
1417 | 670 | tree_taxonomy[t] | ||
1418 | 671 | pass # we have data - NOTE we assume things are *not* updated here... | ||
1419 | 672 | except KeyError: | ||
1420 | 673 | try: | ||
1421 | 674 | tree_taxonomy[t.replace('_',' ')] | ||
1422 | 675 | except KeyError: | ||
1423 | 676 | tree_taxonomy[t] = get_tree_taxa_taxonomy_eol(t) | ||
1424 | 677 | |||
1425 | 678 | if save_taxonomy: | ||
1426 | 679 | if (verbose): | ||
1427 | 680 | print "Saving tree taxonomy" | ||
1428 | 681 | # note -temporary save as we overwrite this file later. | ||
1429 | 682 | stk.save_taxonomy(tree_taxonomy,save_taxonomy_file+'_tree.csv') | ||
1430 | 683 | |||
1431 | 684 | # get taxonomy from worms | ||
1432 | 685 | if verbose: | ||
1433 | 686 | print "Now dealing with all other taxa - this might take a while..." | ||
1434 | 687 | # create a temp file so we can checkpoint and continue | ||
1435 | 688 | tmpf, tmpfile = tempfile.mkstemp() | ||
1436 | 689 | |||
1437 | 690 | if os.path.isfile('.fit_lock'): | ||
1438 | 691 | f = open('.fit_lock','r') | ||
1439 | 692 | tf = f.read() | ||
1440 | 693 | f.close() | ||
1441 | 694 | if os.path.isfile(tf.strip()): | ||
1442 | 695 | taxonomy = stk.load_taxonomy(tf.strip()) | ||
1443 | 696 | os.remove('.fit_lock') | ||
1444 | 697 | |||
1445 | 698 | # create lock file - if this is here, then we load from the file in the lock file (or try to) and continue | ||
1446 | 699 | # where we left off. | ||
1447 | 700 | with open(".fit_lock", 'w') as f: | ||
1448 | 701 | f.write(tmpfile) | ||
1449 | 702 | # bit naughty with tmpfile - we're using the filename rather than handle to write to it. Have to for write_taxonomy function | ||
1450 | 703 | taxonomy, start_level = get_taxonomy_eol(taxonomy,top_level,verbose,tmpfile=tmpfile,skip=skip) # this skips ones already there | ||
1451 | 704 | |||
1452 | 705 | # clean up | ||
1453 | 706 | os.close(tmpf) | ||
1454 | 707 | os.remove('.fit_lock') | ||
1455 | 708 | try: | ||
1456 | 709 | os.remove('tmpfile') | ||
1457 | 710 | except OSError: | ||
1458 | 711 | pass | ||
1459 | 221 | else: | 712 | else: |
1461 | 222 | print "ERROR: Didn't understand you database choice" | 713 | print "ERROR: Didn't understand your database choice" |
1462 | 223 | sys.exit(-1) | 714 | sys.exit(-1) |
1463 | 224 | 715 | ||
1464 | 225 | # clean up taxonomy, deleting the ones already in the tree | 716 | # clean up taxonomy, deleting the ones already in the tree |
1465 | 226 | for taxon in taxa_list: | 717 | for taxon in taxa_list: |
1468 | 227 | taxon = taxon.replace('_',' ') | 718 | taxon = taxon.replace('_',' ') |
1469 | 228 | del taxonomy[taxon] | 719 | try: |
1470 | 720 | del taxonomy[taxon] | ||
1471 | 721 | except KeyError: | ||
1472 | 722 | pass # if it's not there, so we care? | ||
1473 | 723 | |||
1474 | 724 | # We now have 2 taxonomies: | ||
1475 | 725 | # - for taxa in the tree | ||
1476 | 726 | # - for all other taxa in the clade of interest | ||
1477 | 727 | |||
1478 | 728 | if save_taxonomy: | ||
1479 | 729 | tot_taxonomy = taxonomy.copy() | ||
1480 | 730 | tot_taxonomy.update(tree_taxonomy) | ||
1481 | 731 | stk.save_taxonomy(tot_taxonomy,save_taxonomy_file) | ||
1482 | 732 | |||
1483 | 733 | |||
1484 | 734 | orig_taxa_list = taxa_list | ||
1485 | 735 | |||
1486 | 736 | remove_higher_level = [] # for storing the higher level taxa in the original tree that need deleting | ||
1487 | 737 | generic = [] | ||
1488 | 738 | # find all the generic and build an internal subs file | ||
1489 | 739 | for t in taxa_list: | ||
1490 | 740 | t = t.replace(" ","_") | ||
1491 | 741 | if t.find("_") == -1: | ||
1492 | 742 | # no underscore, so just generic | ||
1493 | 743 | generic.append(t) | ||
1494 | 229 | 744 | ||
1495 | 230 | # step up the taxonomy levels from genus, adding taxa to the correct node | 745 | # step up the taxonomy levels from genus, adding taxa to the correct node |
1496 | 231 | # as a polytomy | 746 | # as a polytomy |
1498 | 232 | for level in taxonomy_levels[1::]: # skip species.... | 747 | start_level = start_level.encode('utf-8').strip() |
1499 | 748 | if verbose: | ||
1500 | 749 | print "I think your start OTU is at: ", start_level | ||
1501 | 750 | for level in tlevels[1::]: # skip species.... | ||
1502 | 751 | if verbose: | ||
1503 | 752 | print "Dealing with ",level | ||
1504 | 233 | new_taxa = [] | 753 | new_taxa = [] |
1505 | 234 | for t in taxonomy: | 754 | for t in taxonomy: |
1506 | 235 | # skip odd ones that should be in there | 755 | # skip odd ones that should be in there |
1507 | @@ -239,135 +759,61 @@ | |||
1508 | 239 | except KeyError: | 759 | except KeyError: |
1509 | 240 | continue # don't have this info | 760 | continue # don't have this info |
1510 | 241 | new_taxa = _uniquify(new_taxa) | 761 | new_taxa = _uniquify(new_taxa) |
1511 | 762 | |||
1512 | 242 | for nt in new_taxa: | 763 | for nt in new_taxa: |
1514 | 243 | taxa_to_add = [] | 764 | taxa_to_add = {} |
1515 | 244 | taxa_in_clade = [] | 765 | taxa_in_clade = [] |
1516 | 245 | for t in taxonomy: | 766 | for t in taxonomy: |
1517 | 246 | if start_level in taxonomy[t] and taxonomy[t][start_level] == top_level: | 767 | if start_level in taxonomy[t] and taxonomy[t][start_level] == top_level: |
1518 | 247 | try: | 768 | try: |
1521 | 248 | if taxonomy[t][level] == nt: | 769 | if taxonomy[t][level] == nt and not t in taxa_list: |
1522 | 249 | taxa_to_add.append(t.replace(' ','_')) | 770 | taxa_to_add[t] = taxonomy[t] |
1523 | 250 | except KeyError: | 771 | except KeyError: |
1524 | 251 | continue | 772 | continue |
1525 | 773 | |||
1526 | 252 | # add to tree | 774 | # add to tree |
1527 | 253 | for t in taxa_list: | 775 | for t in taxa_list: |
1528 | 254 | if level in tree_taxonomy[t] and tree_taxonomy[t][level] == nt: | 776 | if level in tree_taxonomy[t] and tree_taxonomy[t][level] == nt: |
1529 | 255 | taxa_in_clade.append(t) | 777 | taxa_in_clade.append(t) |
1536 | 256 | if len(taxa_in_clade) > 0: | 778 | if t in generic: |
1537 | 257 | tree = add_taxa(tree, taxa_to_add, taxa_in_clade) | 779 | # we are appending taxa to this higher taxon, so we need to remove it |
1538 | 258 | for t in taxa_to_add: # clean up taxonomy | 780 | remove_higher_level.append(t) |
1539 | 259 | del taxonomy[t.replace('_',' ')] | 781 | |
1540 | 260 | 782 | ||
1541 | 261 | 783 | if len(taxa_in_clade) > 0 and len(taxa_to_add) > 0: | |
1542 | 784 | tree = add_taxa(tree, taxa_to_add, taxa_in_clade,level) | ||
1543 | 785 | try: | ||
1544 | 786 | taxa_list = stk._getTaxaFromNewick(tree) | ||
1545 | 787 | except stk.TreeParseError as e: | ||
1546 | 788 | print taxa_to_add, taxa_in_clade, level, tree | ||
1547 | 789 | print e.msg | ||
1548 | 790 | return | ||
1549 | 791 | |||
1550 | 792 | for t in taxa_to_add: | ||
1551 | 793 | tree_taxonomy[t.replace(' ','_')] = taxa_to_add[t] | ||
1552 | 794 | try: | ||
1553 | 795 | del taxonomy[t.replace('_',' ')] | ||
1554 | 796 | except KeyError: | ||
1555 | 797 | # It might have _ or it might not... | ||
1556 | 798 | del taxonomy[t] | ||
1557 | 799 | |||
1558 | 800 | |||
1559 | 801 | # remove singelton nodes | ||
1560 | 802 | tree = stk._collapse_nodes(tree) | ||
1561 | 803 | tree = stk._collapse_nodes(tree) | ||
1562 | 804 | tree = stk._collapse_nodes(tree) | ||
1563 | 805 | |||
1564 | 806 | tree = stk._sub_taxa_in_tree(tree, remove_higher_level) | ||
1565 | 262 | trees = {} | 807 | trees = {} |
1566 | 263 | trees['tree_1'] = tree | 808 | trees['tree_1'] = tree |
1567 | 264 | output = stk._amalgamate_trees(trees,format='nexus') | 809 | output = stk._amalgamate_trees(trees,format='nexus') |
1568 | 265 | f = open(output_file, "w") | 810 | f = open(output_file, "w") |
1569 | 266 | f.write(output) | 811 | f.write(output) |
1570 | 267 | f.close() | 812 | f.close() |
1674 | 268 | 813 | taxa_list = stk._getTaxaFromNewick(tree) | |
1675 | 269 | if not save_taxonomy_file == None: | 814 | |
1676 | 270 | with open(save_taxonomy_file, 'w') as f: | 815 | print "Final taxa count:", len(taxa_list) |
1677 | 271 | writer = csv.writer(f) | 816 | |
1575 | 272 | headers = [] | ||
1576 | 273 | headers.append("OTU") | ||
1577 | 274 | headers.extend(taxonomy_levels) | ||
1578 | 275 | headers.append("Data source") | ||
1579 | 276 | writer.writerow(headers) | ||
1580 | 277 | for t in taxonomy: | ||
1581 | 278 | otu = t | ||
1582 | 279 | try: | ||
1583 | 280 | species = taxonomy[t]['species'] | ||
1584 | 281 | except KeyError: | ||
1585 | 282 | species = "-" | ||
1586 | 283 | try: | ||
1587 | 284 | genus = taxonomy[t]['genus'] | ||
1588 | 285 | except KeyError: | ||
1589 | 286 | genus = "-" | ||
1590 | 287 | try: | ||
1591 | 288 | family = taxonomy[t]['family'] | ||
1592 | 289 | except KeyError: | ||
1593 | 290 | family = "-" | ||
1594 | 291 | try: | ||
1595 | 292 | superfamily = taxonomy[t]['superfamily'] | ||
1596 | 293 | except KeyError: | ||
1597 | 294 | superfamily = "-" | ||
1598 | 295 | try: | ||
1599 | 296 | infraorder = taxonomy[t]['infraorder'] | ||
1600 | 297 | except KeyError: | ||
1601 | 298 | infraorder = "-" | ||
1602 | 299 | try: | ||
1603 | 300 | suborder = taxonomy[t]['suborder'] | ||
1604 | 301 | except KeyError: | ||
1605 | 302 | suborder = "-" | ||
1606 | 303 | try: | ||
1607 | 304 | order = taxonomy[t]['order'] | ||
1608 | 305 | except KeyError: | ||
1609 | 306 | order = "-" | ||
1610 | 307 | try: | ||
1611 | 308 | superorder = taxonomy[t]['superorder'] | ||
1612 | 309 | except KeyError: | ||
1613 | 310 | superorder = "-" | ||
1614 | 311 | try: | ||
1615 | 312 | subclass = taxonomy[t]['subclass'] | ||
1616 | 313 | except KeyError: | ||
1617 | 314 | subclass = "-" | ||
1618 | 315 | try: | ||
1619 | 316 | tclass = taxonomy[t]['class'] | ||
1620 | 317 | except KeyError: | ||
1621 | 318 | tclass = "-" | ||
1622 | 319 | try: | ||
1623 | 320 | subphylum = taxonomy[t]['subphylum'] | ||
1624 | 321 | except KeyError: | ||
1625 | 322 | subphylum = "-" | ||
1626 | 323 | try: | ||
1627 | 324 | phylum = taxonomy[t]['phylum'] | ||
1628 | 325 | except KeyError: | ||
1629 | 326 | phylum = "-" | ||
1630 | 327 | try: | ||
1631 | 328 | superphylum = taxonomy[t]['superphylum'] | ||
1632 | 329 | except KeyError: | ||
1633 | 330 | superphylum = "-" | ||
1634 | 331 | try: | ||
1635 | 332 | infrakingdom = taxonomy[t]['infrakingdom'] | ||
1636 | 333 | except: | ||
1637 | 334 | infrakingdom = "-" | ||
1638 | 335 | try: | ||
1639 | 336 | subkingdom = taxonomy[t]['subkingdom'] | ||
1640 | 337 | except: | ||
1641 | 338 | subkingdom = "-" | ||
1642 | 339 | try: | ||
1643 | 340 | kingdom = taxonomy[t]['kingdom'] | ||
1644 | 341 | except KeyError: | ||
1645 | 342 | kingdom = "-" | ||
1646 | 343 | try: | ||
1647 | 344 | provider = taxonomy[t]['provider'] | ||
1648 | 345 | except KeyError: | ||
1649 | 346 | provider = "-" | ||
1650 | 347 | |||
1651 | 348 | if (isinstance(species, list)): | ||
1652 | 349 | species = " ".join(species) | ||
1653 | 350 | this_classification = [ | ||
1654 | 351 | otu.encode('utf-8'), | ||
1655 | 352 | species.encode('utf-8'), | ||
1656 | 353 | genus.encode('utf-8'), | ||
1657 | 354 | family.encode('utf-8'), | ||
1658 | 355 | superfamily.encode('utf-8'), | ||
1659 | 356 | infraorder.encode('utf-8'), | ||
1660 | 357 | suborder.encode('utf-8'), | ||
1661 | 358 | order.encode('utf-8'), | ||
1662 | 359 | superorder.encode('utf-8'), | ||
1663 | 360 | subclass.encode('utf-8'), | ||
1664 | 361 | tclass.encode('utf-8'), | ||
1665 | 362 | subphylum.encode('utf-8'), | ||
1666 | 363 | phylum.encode('utf-8'), | ||
1667 | 364 | superphylum.encode('utf-8'), | ||
1668 | 365 | infrakingdom.encode('utf-8'), | ||
1669 | 366 | subkingdom.encode('utf-8'), | ||
1670 | 367 | kingdom.encode('utf-8'), | ||
1671 | 368 | provider.encode('utf-8')] | ||
1672 | 369 | writer.writerow(this_classification) | ||
1673 | 370 | |||
1678 | 371 | 817 | ||
1679 | 372 | def _uniquify(l): | 818 | def _uniquify(l): |
1680 | 373 | """ | 819 | """ |
1681 | @@ -379,28 +825,119 @@ | |||
1682 | 379 | 825 | ||
1683 | 380 | return keys.keys() | 826 | return keys.keys() |
1684 | 381 | 827 | ||
1686 | 382 | def add_taxa(tree, new_taxa, taxa_in_clade): | 828 | def add_taxa(tree, new_taxa, taxa_in_clade, level): |
1687 | 383 | 829 | ||
1688 | 384 | # create new tree of the new taxa | 830 | # create new tree of the new taxa |
1691 | 385 | #tree_string = "(" + ",".join(new_taxa) + ");" | 831 | additionalTaxa = tree_from_taxonomy(level,new_taxa) |
1690 | 386 | #additionalTaxa = stk._parse_tree(tree_string) | ||
1692 | 387 | 832 | ||
1693 | 388 | # find mrca parent | 833 | # find mrca parent |
1694 | 389 | treeobj = stk._parse_tree(tree) | 834 | treeobj = stk._parse_tree(tree) |
1695 | 390 | mrca = stk.get_mrca(tree,taxa_in_clade) | 835 | mrca = stk.get_mrca(tree,taxa_in_clade) |
1707 | 391 | mrca_parent = treeobj.node(mrca).parent | 836 | if (mrca == 0): |
1708 | 392 | 837 | # we need to make a new tree! The additional taxa are being placed at the root of the tree | |
1709 | 393 | # insert a node into the tree between the MRCA and it's parent (p4.addNodeBetweenNodes) | 838 | t = Tree() |
1710 | 394 | newNode = treeobj.addNodeBetweenNodes(mrca, mrca_parent) | 839 | A = t.add_child() |
1711 | 395 | 840 | B = t.add_child() | |
1712 | 396 | # add the new tree at the new node using p4.addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None) | 841 | t1 = Tree(additionalTaxa) |
1713 | 397 | #treeobj.addSubTree(newNode, additionalTaxa) | 842 | t2 = Tree(tree) |
1714 | 398 | for t in new_taxa: | 843 | A.add_child(t1) |
1715 | 399 | treeobj.addSibLeaf(newNode,t) | 844 | B.add_child(t2) |
1716 | 400 | 845 | return t.write(format=9) | |
1717 | 401 | # return new tree | 846 | else: |
1718 | 847 | mrca = treeobj.nodes[mrca] | ||
1719 | 848 | additionalTaxa = stk._parse_tree(additionalTaxa) | ||
1720 | 849 | |||
1721 | 850 | if len(taxa_in_clade) == 1: | ||
1722 | 851 | taxon = treeobj.node(taxa_in_clade[0]) | ||
1723 | 852 | mrca = treeobj.addNodeBetweenNodes(taxon,mrca) | ||
1724 | 853 | |||
1725 | 854 | |||
1726 | 855 | # insert a node into the tree between the MRCA and it's parent (p4.addNodeBetweenNodes) | ||
1727 | 856 | # newNode = treeobj.addNodeBetweenNodes(mrca, mrca_parent) | ||
1728 | 857 | |||
1729 | 858 | # add the new tree at the new node using p4.addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None) | ||
1730 | 859 | treeobj.addSubTree(mrca, additionalTaxa, ignoreRootAssert=True) | ||
1731 | 860 | |||
1732 | 402 | return treeobj.writeNewick(fName=None,toString=True).strip() | 861 | return treeobj.writeNewick(fName=None,toString=True).strip() |
1733 | 403 | 862 | ||
1734 | 863 | |||
1735 | 864 | |||
1736 | 865 | def tree_from_taxonomy(top_level, tree_taxonomy): | ||
1737 | 866 | |||
1738 | 867 | start_level = taxonomy_levels.index(top_level) | ||
1739 | 868 | new_taxa = tree_taxonomy.keys() | ||
1740 | 869 | |||
1741 | 870 | tl_types = [] | ||
1742 | 871 | for tt in tree_taxonomy: | ||
1743 | 872 | tl_types.append(tree_taxonomy[tt][top_level]) | ||
1744 | 873 | |||
1745 | 874 | tl_types = _uniquify(tl_types) | ||
1746 | 875 | levels_to_worry_about = tlevels[0:tlevels.index(top_level)+1] | ||
1747 | 876 | |||
1748 | 877 | t = Tree() | ||
1749 | 878 | nodes = {} | ||
1750 | 879 | nodes[top_level] = [] | ||
1751 | 880 | for tl in tl_types: | ||
1752 | 881 | n = t.add_child(name=tl) | ||
1753 | 882 | nodes[top_level].append({tl:n}) | ||
1754 | 883 | |||
1755 | 884 | for l in levels_to_worry_about[-2::-1]: | ||
1756 | 885 | names = [] | ||
1757 | 886 | nodes[l] = [] | ||
1758 | 887 | ci = levels_to_worry_about.index(l) | ||
1759 | 888 | for tt in tree_taxonomy: | ||
1760 | 889 | try: | ||
1761 | 890 | names.append(tree_taxonomy[tt][l]) | ||
1762 | 891 | except KeyError: | ||
1763 | 892 | pass | ||
1764 | 893 | names = _uniquify(names) | ||
1765 | 894 | for n in names: | ||
1766 | 895 | # find my parent | ||
1767 | 896 | parent = None | ||
1768 | 897 | for tt in tree_taxonomy: | ||
1769 | 898 | try: | ||
1770 | 899 | if tree_taxonomy[tt][l] == n: | ||
1771 | 900 | try: | ||
1772 | 901 | parent = tree_taxonomy[tt][levels_to_worry_about[ci+1]] | ||
1773 | 902 | level = ci+1 | ||
1774 | 903 | except KeyError: | ||
1775 | 904 | try: | ||
1776 | 905 | parent = tree_taxonomy[tt][levels_to_worry_about[ci+2]] | ||
1777 | 906 | level = ci+2 | ||
1778 | 907 | except KeyError: | ||
1779 | 908 | try: | ||
1780 | 909 | parent = tree_taxonomy[tt][levels_to_worry_about[ci+3]] | ||
1781 | 910 | level = ci+3 | ||
1782 | 911 | except KeyError: | ||
1783 | 912 | print "ERROR: tried to find some taxonomic info for "+tt+" from tree_taxonomy file/downloaded data and I went two levels up, but failed find any. Looked at:\n" | ||
1784 | 913 | print "\t"+levels_to_worry_about[ci+1] | ||
1785 | 914 | print "\t"+levels_to_worry_about[ci+2] | ||
1786 | 915 | print "\t"+levels_to_worry_about[ci+3] | ||
1787 | 916 | print "This is the taxonomy info I have for "+tt | ||
1788 | 917 | print tree_taxonomy[tt] | ||
1789 | 918 | sys.exit(1) | ||
1790 | 919 | |||
1791 | 920 | k = [] | ||
1792 | 921 | for nd in nodes[levels_to_worry_about[level]]: | ||
1793 | 922 | k.extend(nd.keys()) | ||
1794 | 923 | i = 0 | ||
1795 | 924 | for kk in k: | ||
1796 | 925 | if kk == parent: | ||
1797 | 926 | break | ||
1798 | 927 | i += 1 | ||
1799 | 928 | parent_id = i | ||
1800 | 929 | break | ||
1801 | 930 | except KeyError: | ||
1802 | 931 | pass # no data at this level for this beastie | ||
1803 | 932 | # find out where to attach it | ||
1804 | 933 | node_id = nodes[levels_to_worry_about[level]][parent_id][parent] | ||
1805 | 934 | nd = node_id.add_child(name=n.replace(" ","_")) | ||
1806 | 935 | nodes[l].append({n:nd}) | ||
1807 | 936 | |||
1808 | 937 | tree = t.write(format=9) | ||
1809 | 938 | |||
1810 | 939 | return tree | ||
1811 | 940 | |||
1812 | 404 | if __name__ == "__main__": | 941 | if __name__ == "__main__": |
1813 | 405 | main() | 942 | main() |
1814 | 406 | 943 | ||
1815 | 407 | 944 | ||
1816 | === modified file 'stk/scripts/plot_character_taxa_matrix.py' | |||
1817 | --- stk/scripts/plot_character_taxa_matrix.py 2014-12-10 08:55:43 +0000 | |||
1818 | +++ stk/scripts/plot_character_taxa_matrix.py 2017-01-12 09:27:31 +0000 | |||
1819 | @@ -42,6 +42,18 @@ | |||
1820 | 42 | default=False | 42 | default=False |
1821 | 43 | ) | 43 | ) |
1822 | 44 | parser.add_argument( | 44 | parser.add_argument( |
1823 | 45 | '-t', | ||
1824 | 46 | '--taxonomy', | ||
1825 | 47 | help="Use taxonomy to sort the taxa on the axis. Supply a STK taxonomy file", | ||
1826 | 48 | ) | ||
1827 | 49 | parser.add_argument( | ||
1828 | 50 | '--level', | ||
1829 | 51 | choices=['family','superfamily','infraorder','suborder','order'], | ||
1830 | 52 | default='family', | ||
1831 | 53 | help="""What level to group the taxonomy at. Default is family. | ||
1832 | 54 | Note data for a particular levelmay be missing in taxonomy.""" | ||
1833 | 55 | ) | ||
1834 | 56 | parser.add_argument( | ||
1835 | 45 | 'input_file', | 57 | 'input_file', |
1836 | 46 | metavar='input_file', | 58 | metavar='input_file', |
1837 | 47 | nargs=1, | 59 | nargs=1, |
1838 | @@ -59,14 +71,58 @@ | |||
1839 | 59 | verbose = args.verbose | 71 | verbose = args.verbose |
1840 | 60 | input_file = args.input_file[0] | 72 | input_file = args.input_file[0] |
1841 | 61 | output_file = args.output_file[0] | 73 | output_file = args.output_file[0] |
1842 | 74 | taxonomy = args.taxonomy | ||
1843 | 75 | level = args.level | ||
1844 | 62 | 76 | ||
1845 | 63 | XML = stk.load_phyml(input_file) | 77 | XML = stk.load_phyml(input_file) |
1846 | 78 | if not taxonomy == None: | ||
1847 | 79 | taxonomy = stk.load_taxonomy(taxonomy) | ||
1848 | 80 | |||
1849 | 64 | all_taxa = stk.get_all_taxa(XML) | 81 | all_taxa = stk.get_all_taxa(XML) |
1850 | 65 | all_chars_d = stk.get_all_characters(XML) | 82 | all_chars_d = stk.get_all_characters(XML) |
1851 | 66 | all_chars = [] | 83 | all_chars = [] |
1852 | 67 | for c in all_chars_d: | 84 | for c in all_chars_d: |
1853 | 68 | all_chars.extend(all_chars_d[c]) | 85 | all_chars.extend(all_chars_d[c]) |
1854 | 69 | 86 | ||
1855 | 87 | if not taxonomy == None: | ||
1856 | 88 | tax_data = {} | ||
1857 | 89 | new_all_taxa = [] | ||
1858 | 90 | for t in all_taxa: | ||
1859 | 91 | taxon = t.replace("_"," ") | ||
1860 | 92 | try: | ||
1861 | 93 | if taxonomy[taxon][level] == "": | ||
1862 | 94 | # skip this | ||
1863 | 95 | continue | ||
1864 | 96 | tax_data[t] = taxonomy[taxon][level] | ||
1865 | 97 | except KeyError: | ||
1866 | 98 | print "Couldn't find "+t+" in taxonomy. Adding as null data" | ||
1867 | 99 | tax_data[t] = 'zzzzz' # it's at the end... | ||
1868 | 100 | |||
1869 | 101 | from sets import Set | ||
1870 | 102 | unique = set(tax_data.values()) | ||
1871 | 103 | unique = list(unique) | ||
1872 | 104 | unique.sort() | ||
1873 | 105 | print "Groups are:" | ||
1874 | 106 | print unique | ||
1875 | 107 | counts = [] | ||
1876 | 108 | for u in unique: | ||
1877 | 109 | count = 0 | ||
1878 | 110 | for t in tax_data: | ||
1879 | 111 | if tax_data[t] == u: | ||
1880 | 112 | count += 1 | ||
1881 | 113 | new_all_taxa.append(t) | ||
1882 | 114 | counts.append(count) | ||
1883 | 115 | |||
1884 | 116 | all_taxa = new_all_taxa | ||
1885 | 117 | # cumulate counts | ||
1886 | 118 | count_cumulate = [] | ||
1887 | 119 | count_cumulate.append(counts[0]) | ||
1888 | 120 | for c in counts[1::]: | ||
1889 | 121 | count_cumulate.append(c+count_cumulate[-1]) | ||
1890 | 122 | |||
1891 | 123 | print count_cumulate | ||
1892 | 124 | |||
1893 | 125 | |||
1894 | 70 | taxa_character_matrix = {} | 126 | taxa_character_matrix = {} |
1895 | 71 | for t in all_taxa: | 127 | for t in all_taxa: |
1896 | 72 | taxa_character_matrix[t] = [] | 128 | taxa_character_matrix[t] = [] |
1897 | @@ -77,7 +133,8 @@ | |||
1898 | 77 | taxa = stk.get_taxa_from_tree(XML,t, sort=True) | 133 | taxa = stk.get_taxa_from_tree(XML,t, sort=True) |
1899 | 78 | for taxon in taxa: | 134 | for taxon in taxa: |
1900 | 79 | taxon = taxon.replace(" ","_") | 135 | taxon = taxon.replace(" ","_") |
1902 | 80 | taxa_character_matrix[taxon].extend(chars) | 136 | if taxon in all_taxa: |
1903 | 137 | taxa_character_matrix[taxon].extend(chars) | ||
1904 | 81 | 138 | ||
1905 | 82 | for t in taxa_character_matrix: | 139 | for t in taxa_character_matrix: |
1906 | 83 | array = taxa_character_matrix[t] | 140 | array = taxa_character_matrix[t] |
1907 | @@ -92,6 +149,31 @@ | |||
1908 | 92 | x.append(i) | 149 | x.append(i) |
1909 | 93 | y.append(j) | 150 | y.append(j) |
1910 | 94 | 151 | ||
1911 | 152 | |||
1912 | 153 | i = 0 | ||
1913 | 154 | for j in all_chars: | ||
1914 | 155 | # do a substitution of character names to tidy things up | ||
1915 | 156 | if j.lower().startswith('mitochondrial carrier; adenine nucleotide translocator'): | ||
1916 | 157 | j = "ANT" | ||
1917 | 158 | if j.lower().startswith('mitochondrially encoded 12s'): | ||
1918 | 159 | j = '12S' | ||
1919 | 160 | if j.lower().startswith('complete mitochondrial genome'): | ||
1920 | 161 | j = 'Mitogenome' | ||
1921 | 162 | if j.lower().startswith('mtdna'): | ||
1922 | 163 | j = "mtDNA restriction sites" | ||
1923 | 164 | if j.lower().startswith('h3 histone'): | ||
1924 | 165 | j = 'H3' | ||
1925 | 166 | if j.lower().startswith('mitochondrially encoded cytochrome'): | ||
1926 | 167 | j = 'COI' | ||
1927 | 168 | if j.lower().startswith('rna, 28s'): | ||
1928 | 169 | j = '28S' | ||
1929 | 170 | if j.lower().startswith('rna, 18s'): | ||
1930 | 171 | j = '18S' | ||
1931 | 172 | if j.lower().startswith('mitochondrially encoded 16s'): | ||
1932 | 173 | j = '16S' | ||
1933 | 174 | all_chars[i] = j | ||
1934 | 175 | i += 1 | ||
1935 | 176 | |||
1936 | 95 | fig=figure(figsize=(22,17),dpi=90) | 177 | fig=figure(figsize=(22,17),dpi=90) |
1937 | 96 | fig.subplots_adjust(left=0.3) | 178 | fig.subplots_adjust(left=0.3) |
1938 | 97 | ax = fig.add_subplot(1,1,1) | 179 | ax = fig.add_subplot(1,1,1) |
1939 | 98 | 180 | ||
1940 | === modified file 'stk/scripts/plot_tree_taxa_matrix.py' | |||
1941 | --- stk/scripts/plot_tree_taxa_matrix.py 2014-12-10 08:55:43 +0000 | |||
1942 | +++ stk/scripts/plot_tree_taxa_matrix.py 2017-01-12 09:27:31 +0000 | |||
1943 | @@ -43,6 +43,18 @@ | |||
1944 | 43 | default=False | 43 | default=False |
1945 | 44 | ) | 44 | ) |
1946 | 45 | parser.add_argument( | 45 | parser.add_argument( |
1947 | 46 | '-t', | ||
1948 | 47 | '--taxonomy', | ||
1949 | 48 | help="Use taxonomy to sort the taxa on the axis. Supply a STK taxonomy file", | ||
1950 | 49 | ) | ||
1951 | 50 | parser.add_argument( | ||
1952 | 51 | '--level', | ||
1953 | 52 | choices=['family','superfamily','infraorder','suborder','order'], | ||
1954 | 53 | default='family', | ||
1955 | 54 | help="""What level to group the taxonomy at. Default is family. | ||
1956 | 55 | Note data for a particular levelmay be missing in taxonomy.""" | ||
1957 | 56 | ) | ||
1958 | 57 | parser.add_argument( | ||
1959 | 46 | 'input_file', | 58 | 'input_file', |
1960 | 47 | metavar='input_file', | 59 | metavar='input_file', |
1961 | 48 | nargs=1, | 60 | nargs=1, |
1962 | @@ -60,13 +72,57 @@ | |||
1963 | 60 | verbose = args.verbose | 72 | verbose = args.verbose |
1964 | 61 | input_file = args.input_file[0] | 73 | input_file = args.input_file[0] |
1965 | 62 | output_file = args.output_file[0] | 74 | output_file = args.output_file[0] |
1966 | 75 | taxonomy = args.taxonomy | ||
1967 | 76 | level = args.level | ||
1968 | 63 | 77 | ||
1969 | 64 | XML = stk.load_phyml(input_file) | 78 | XML = stk.load_phyml(input_file) |
1970 | 79 | if not taxonomy == None: | ||
1971 | 80 | taxonomy = stk.load_taxonomy(taxonomy) | ||
1972 | 81 | |||
1973 | 65 | all_taxa = stk.get_all_taxa(XML) | 82 | all_taxa = stk.get_all_taxa(XML) |
1974 | 66 | 83 | ||
1975 | 67 | taxa_tree_matrix = {} | 84 | taxa_tree_matrix = {} |
1976 | 68 | for t in all_taxa: | 85 | for t in all_taxa: |
1977 | 69 | taxa_tree_matrix[t] = [] | 86 | taxa_tree_matrix[t] = [] |
1978 | 87 | |||
1979 | 88 | if not taxonomy == None: | ||
1980 | 89 | tax_data = {} | ||
1981 | 90 | new_all_taxa = [] | ||
1982 | 91 | for t in all_taxa: | ||
1983 | 92 | taxon = t.replace("_"," ") | ||
1984 | 93 | try: | ||
1985 | 94 | if taxonomy[taxon][level] == "": | ||
1986 | 95 | # skip this | ||
1987 | 96 | continue | ||
1988 | 97 | tax_data[t] = taxonomy[taxon][level] | ||
1989 | 98 | except KeyError: | ||
1990 | 99 | print "Couldn't find "+t+" in taxonomy. Adding as null data" | ||
1991 | 100 | tax_data[t] = 'zzzzz' # it's at the end... | ||
1992 | 101 | |||
1993 | 102 | from sets import Set | ||
1994 | 103 | unique = set(tax_data.values()) | ||
1995 | 104 | unique = list(unique) | ||
1996 | 105 | unique.sort() | ||
1997 | 106 | print "Groups are:" | ||
1998 | 107 | print unique | ||
1999 | 108 | counts = [] | ||
2000 | 109 | for u in unique: | ||
2001 | 110 | count = 0 | ||
2002 | 111 | for t in tax_data: | ||
2003 | 112 | if tax_data[t] == u: | ||
2004 | 113 | count += 1 | ||
2005 | 114 | new_all_taxa.append(t) | ||
2006 | 115 | counts.append(count) | ||
2007 | 116 | |||
2008 | 117 | all_taxa = new_all_taxa | ||
2009 | 118 | # cumulate counts | ||
2010 | 119 | count_cumulate = [] | ||
2011 | 120 | count_cumulate.append(counts[0]) | ||
2012 | 121 | for c in counts[1::]: | ||
2013 | 122 | count_cumulate.append(c+count_cumulate[-1]) | ||
2014 | 123 | |||
2015 | 124 | print count_cumulate | ||
2016 | 125 | |||
2017 | 70 | 126 | ||
2018 | 71 | trees = stk.obtain_trees(XML) | 127 | trees = stk.obtain_trees(XML) |
2019 | 72 | i = 0 | 128 | i = 0 |
2020 | 73 | 129 | ||
2021 | === modified file 'stk/scripts/remove_poorly_constrained_taxa.py' | |||
2022 | --- stk/scripts/remove_poorly_constrained_taxa.py 2014-04-18 11:57:14 +0000 | |||
2023 | +++ stk/scripts/remove_poorly_constrained_taxa.py 2017-01-12 09:27:31 +0000 | |||
2024 | @@ -12,8 +12,8 @@ | |||
2025 | 12 | 12 | ||
2026 | 13 | # do stuff | 13 | # do stuff |
2027 | 14 | parser = argparse.ArgumentParser( | 14 | parser = argparse.ArgumentParser( |
2030 | 15 | prog="convert tree from specific to generic", | 15 | prog="remove poorly contrained taxa", |
2031 | 16 | description="""Converts a tree at specific level to generic level""", | 16 | description="""Remove taxa that appea in one source tree only.""", |
2032 | 17 | ) | 17 | ) |
2033 | 18 | parser.add_argument( | 18 | parser.add_argument( |
2034 | 19 | '-v', | 19 | '-v', |
2035 | @@ -34,6 +34,13 @@ | |||
2036 | 34 | " to removal those in polytomies *and* only in one other tree." | 34 | " to removal those in polytomies *and* only in one other tree." |
2037 | 35 | ) | 35 | ) |
2038 | 36 | parser.add_argument( | 36 | parser.add_argument( |
2039 | 37 | '--tree_only', | ||
2040 | 38 | default=False, | ||
2041 | 39 | action='store_true', | ||
2042 | 40 | help="Restrict removal of taxa that only occur in one source tree. Default"+ | ||
2043 | 41 | " to removal those in polytomies *and* only in one other tree." | ||
2044 | 42 | ) | ||
2045 | 43 | parser.add_argument( | ||
2046 | 37 | 'input_phyml', | 44 | 'input_phyml', |
2047 | 38 | metavar='input_phyml', | 45 | metavar='input_phyml', |
2048 | 39 | nargs=1, | 46 | nargs=1, |
2049 | @@ -43,13 +50,13 @@ | |||
2050 | 43 | 'input_tree', | 50 | 'input_tree', |
2051 | 44 | metavar='input_tree', | 51 | metavar='input_tree', |
2052 | 45 | nargs=1, | 52 | nargs=1, |
2054 | 46 | help="Your tree" | 53 | help="Your tree - can be NULL or None" |
2055 | 47 | ) | 54 | ) |
2056 | 48 | parser.add_argument( | 55 | parser.add_argument( |
2057 | 49 | 'output_tree', | 56 | 'output_tree', |
2058 | 50 | metavar='output_tree', | 57 | metavar='output_tree', |
2059 | 51 | nargs=1, | 58 | nargs=1, |
2061 | 52 | help="Your output tree" | 59 | help="Your output tree or phyml - if input_tree is none, this is the Phyml" |
2062 | 53 | ) | 60 | ) |
2063 | 54 | 61 | ||
2064 | 55 | 62 | ||
2065 | @@ -62,14 +69,20 @@ | |||
2066 | 62 | dl = True | 69 | dl = True |
2067 | 63 | poly_only = args.poly_only | 70 | poly_only = args.poly_only |
2068 | 64 | input_tree = args.input_tree[0] | 71 | input_tree = args.input_tree[0] |
2070 | 65 | output_tree = args.output_tree[0] | 72 | if input_tree == 'NULL' or input_tree == 'None': |
2071 | 73 | input_tree = None | ||
2072 | 74 | output_file = args.output_tree[0] | ||
2073 | 66 | input_phyml = args.input_phyml[0] | 75 | input_phyml = args.input_phyml[0] |
2074 | 67 | 76 | ||
2075 | 68 | XML = stk.load_phyml(input_phyml) | 77 | XML = stk.load_phyml(input_phyml) |
2076 | 69 | # load tree | 78 | # load tree |
2078 | 70 | supertree = stk.import_tree(input_tree) | 79 | if (not input_tree == None): |
2079 | 80 | supertree = stk.import_tree(input_tree) | ||
2080 | 81 | taxa = stk._getTaxaFromNewick(supertree) | ||
2081 | 82 | else: | ||
2082 | 83 | supertree = None | ||
2083 | 84 | taxa = stk.get_all_taxa(XML) | ||
2084 | 71 | # grab taxa | 85 | # grab taxa |
2085 | 72 | taxa = stk._getTaxaFromNewick(supertree) | ||
2086 | 73 | delete_list = [] | 86 | delete_list = [] |
2087 | 74 | 87 | ||
2088 | 75 | # loop over taxa in supertree and get some stats | 88 | # loop over taxa in supertree and get some stats |
2089 | @@ -115,19 +128,29 @@ | |||
2090 | 115 | 128 | ||
2091 | 116 | print "Taxa: "+str(len(taxa)) | 129 | print "Taxa: "+str(len(taxa)) |
2092 | 117 | print "Deleting: "+str(len(delete_list)) | 130 | print "Deleting: "+str(len(delete_list)) |
2106 | 118 | # done, so delete the problem taxa from the supertree | 131 | |
2107 | 119 | for t in delete_list: | 132 | if not supertree == None: |
2108 | 120 | # remove taxa from supertree | 133 | # done, so delete the problem taxa from the supertree |
2109 | 121 | supertree = stk._sub_taxa_in_tree(supertree,t) | 134 | for t in delete_list: |
2110 | 122 | 135 | # remove taxa from supertree | |
2111 | 123 | # save supertree | 136 | supertree = stk._sub_taxa_in_tree(supertree,t) |
2112 | 124 | tree = {} | 137 | |
2113 | 125 | tree['Tree_1'] = supertree | 138 | # save supertree |
2114 | 126 | output = stk._amalgamate_trees(tree,format='nexus') | 139 | tree = {} |
2115 | 127 | # write file | 140 | tree['Tree_1'] = supertree |
2116 | 128 | f = open(output_tree,"w") | 141 | output = stk._amalgamate_trees(tree,format='nexus') |
2117 | 129 | f.write(output) | 142 | # write file |
2118 | 130 | f.close() | 143 | f = open(output_file,"w") |
2119 | 144 | f.write(output) | ||
2120 | 145 | f.close() | ||
2121 | 146 | else: | ||
2122 | 147 | new_phyml = stk.substitute_taxa(XML,delete_list) | ||
2123 | 148 | # write file | ||
2124 | 149 | f = open(output_file,"w") | ||
2125 | 150 | f.write(new_phyml) | ||
2126 | 151 | f.close() | ||
2127 | 152 | |||
2128 | 153 | |||
2129 | 131 | 154 | ||
2130 | 132 | if (dl): | 155 | if (dl): |
2131 | 133 | # write file | 156 | # write file |
2132 | 134 | 157 | ||
2133 | === added file 'stk/scripts/tree_from_taxonomy.py' | |||
2134 | --- stk/scripts/tree_from_taxonomy.py 1970-01-01 00:00:00 +0000 | |||
2135 | +++ stk/scripts/tree_from_taxonomy.py 2017-01-12 09:27:31 +0000 | |||
2136 | @@ -0,0 +1,142 @@ | |||
2137 | 1 | # trees ready for supretree construction. | ||
2138 | 2 | # Copyright (C) 2015, Jon Hill, Katie Davis | ||
2139 | 3 | # | ||
2140 | 4 | # This program is free software: you can redistribute it and/or modify | ||
2141 | 5 | # it under the terms of the GNU General Public License as published by | ||
2142 | 6 | # the Free Software Foundation, either version 3 of the License, or | ||
2143 | 7 | # (at your option) any later version. | ||
2144 | 8 | # | ||
2145 | 9 | # This program is distributed in the hope that it will be useful, | ||
2146 | 10 | # but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
2147 | 11 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
2148 | 12 | # GNU General Public License for more details. | ||
2149 | 13 | # | ||
2150 | 14 | # You should have received a copy of the GNU General Public License | ||
2151 | 15 | # along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
2152 | 16 | # | ||
2153 | 17 | # Jon Hill. jon.hill@york.ac.uk | ||
2154 | 18 | |||
2155 | 19 | import argparse | ||
2156 | 20 | import copy | ||
2157 | 21 | import os | ||
2158 | 22 | import sys | ||
2159 | 23 | stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir ) | ||
2160 | 24 | sys.path.insert(0, stk_path) | ||
2161 | 25 | import supertree_toolkit as stk | ||
2162 | 26 | import csv | ||
2163 | 27 | from ete2 import Tree | ||
2164 | 28 | |||
2165 | 29 | taxonomy_levels = ['species','subgenus','genus','subfamily','family','superfamily','subsection','section','infraorder','suborder','order','superorder','subclass','class','superclass','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] | ||
2166 | 30 | tlevels = ['species','genus','family','order','class','phylum','kingdom'] | ||
2167 | 31 | |||
2168 | 32 | |||
2169 | 33 | def main(): | ||
2170 | 34 | |||
2171 | 35 | # do stuff | ||
2172 | 36 | parser = argparse.ArgumentParser( | ||
2173 | 37 | prog="create a tree from a taxonomy file", | ||
2174 | 38 | description="Create a taxonomic tree", | ||
2175 | 39 | ) | ||
2176 | 40 | parser.add_argument( | ||
2177 | 41 | '-v', | ||
2178 | 42 | '--verbose', | ||
2179 | 43 | action='store_true', | ||
2180 | 44 | help="Verbose output: mainly progress reports.", | ||
2181 | 45 | default=False | ||
2182 | 46 | ) | ||
2183 | 47 | parser.add_argument( | ||
2184 | 48 | 'top_level', | ||
2185 | 49 | nargs=1, | ||
2186 | 50 | help="The top level group to start with, e.g. family" | ||
2187 | 51 | ) | ||
2188 | 52 | parser.add_argument( | ||
2189 | 53 | 'input_file', | ||
2190 | 54 | metavar='input_file', | ||
2191 | 55 | nargs=1, | ||
2192 | 56 | help="Your taxonomy file" | ||
2193 | 57 | ) | ||
2194 | 58 | parser.add_argument( | ||
2195 | 59 | 'output_file', | ||
2196 | 60 | metavar='output_file', | ||
2197 | 61 | nargs=1, | ||
2198 | 62 | help="Your new tree file" | ||
2199 | 63 | ) | ||
2200 | 64 | |||
2201 | 65 | args = parser.parse_args() | ||
2202 | 66 | verbose = args.verbose | ||
2203 | 67 | input_file = args.input_file[0] | ||
2204 | 68 | output_file = args.output_file[0] | ||
2205 | 69 | top_level = args.top_level[0] | ||
2206 | 70 | |||
2207 | 71 | start_level = taxonomy_levels.index(top_level) | ||
2208 | 72 | tree_taxonomy = stk.load_taxonomy(input_file) | ||
2209 | 73 | new_taxa = tree_taxonomy.keys() | ||
2210 | 74 | |||
2211 | 75 | tl_types = [] | ||
2212 | 76 | for tt in tree_taxonomy: | ||
2213 | 77 | tl_types.append(tree_taxonomy[tt][top_level]) | ||
2214 | 78 | |||
2215 | 79 | tl_types = _uniquify(tl_types) | ||
2216 | 80 | levels_to_worry_about = tlevels[0:tlevels.index(top_level)+1] | ||
2217 | 81 | |||
2218 | 82 | #print levels_to_worry_about[-2::-1] | ||
2219 | 83 | |||
2220 | 84 | t = Tree() | ||
2221 | 85 | nodes = {} | ||
2222 | 86 | nodes[top_level] = [] | ||
2223 | 87 | for tl in tl_types: | ||
2224 | 88 | n = t.add_child(name=tl) | ||
2225 | 89 | nodes[top_level].append({tl:n}) | ||
2226 | 90 | |||
2227 | 91 | for l in levels_to_worry_about[-2::-1]: | ||
2228 | 92 | #print t | ||
2229 | 93 | names = [] | ||
2230 | 94 | nodes[l] = [] | ||
2231 | 95 | ci = levels_to_worry_about.index(l) | ||
2232 | 96 | for tt in tree_taxonomy: | ||
2233 | 97 | names.append(tree_taxonomy[tt][l]) | ||
2234 | 98 | names = _uniquify(names) | ||
2235 | 99 | for n in names: | ||
2236 | 100 | #print n | ||
2237 | 101 | # find my parent | ||
2238 | 102 | parent = None | ||
2239 | 103 | for tt in tree_taxonomy: | ||
2240 | 104 | if tree_taxonomy[tt][l] == n: | ||
2241 | 105 | parent = tree_taxonomy[tt][levels_to_worry_about[ci+1]] | ||
2242 | 106 | k = [] | ||
2243 | 107 | for nd in nodes[levels_to_worry_about[ci+1]]: | ||
2244 | 108 | k.extend(nd.keys()) | ||
2245 | 109 | i = 0 | ||
2246 | 110 | for kk in k: | ||
2247 | 111 | print kk | ||
2248 | 112 | if kk == parent: | ||
2249 | 113 | break | ||
2250 | 114 | i += 1 | ||
2251 | 115 | parent_id = i | ||
2252 | 116 | break | ||
2253 | 117 | # find out where to attach it | ||
2254 | 118 | node_id = nodes[levels_to_worry_about[ci+1]][parent_id][parent] | ||
2255 | 119 | nd = node_id.add_child(name=n.replace(" ","_")) | ||
2256 | 120 | nodes[l].append({n:nd}) | ||
2257 | 121 | |||
2258 | 122 | tree = t.write(format=9) | ||
2259 | 123 | tree = stk._collapse_nodes(tree) | ||
2260 | 124 | tree = stk._collapse_nodes(tree) | ||
2261 | 125 | print tree | ||
2262 | 126 | |||
2263 | 127 | |||
2264 | 128 | def _uniquify(l): | ||
2265 | 129 | """ | ||
2266 | 130 | Make a list, l, contain only unique data | ||
2267 | 131 | """ | ||
2268 | 132 | keys = {} | ||
2269 | 133 | for e in l: | ||
2270 | 134 | keys[e] = 1 | ||
2271 | 135 | |||
2272 | 136 | return keys.keys() | ||
2273 | 137 | |||
2274 | 138 | if __name__ == "__main__": | ||
2275 | 139 | main() | ||
2276 | 140 | |||
2277 | 141 | |||
2278 | 142 | |||
2279 | 0 | 143 | ||
2280 | === modified file 'stk/stk' | |||
2281 | --- stk/stk 2014-12-09 10:58:48 +0000 | |||
2282 | +++ stk/stk 2017-01-12 09:27:31 +0000 | |||
2283 | @@ -23,6 +23,7 @@ | |||
2284 | 23 | import sys | 23 | import sys |
2285 | 24 | import argparse | 24 | import argparse |
2286 | 25 | import traceback | 25 | import traceback |
2287 | 26 | import time | ||
2288 | 26 | try: | 27 | try: |
2289 | 27 | __file__ | 28 | __file__ |
2290 | 28 | except NameError: | 29 | except NameError: |
2291 | @@ -41,6 +42,10 @@ | |||
2292 | 41 | import string | 42 | import string |
2293 | 42 | import stk.p4 as p4 | 43 | import stk.p4 as p4 |
2294 | 43 | import lxml | 44 | import lxml |
2295 | 45 | import csv | ||
2296 | 46 | import tempfile | ||
2297 | 47 | from subprocess import check_call, CalledProcessError, call | ||
2298 | 48 | |||
2299 | 44 | import stk.bzr_version as bzr_version | 49 | import stk.bzr_version as bzr_version |
2300 | 45 | d = bzr_version.version_info | 50 | d = bzr_version.version_info |
2301 | 46 | build = d.get('revno','<unknown revno>') | 51 | build = d.get('revno','<unknown revno>') |
2302 | @@ -366,7 +371,7 @@ | |||
2303 | 366 | 371 | ||
2304 | 367 | # Clean data | 372 | # Clean data |
2305 | 368 | parser_cm = subparsers.add_parser('clean_data', | 373 | parser_cm = subparsers.add_parser('clean_data', |
2307 | 369 | help='Remove errant taxa, uninformative trees and empty sources.' | 374 | help='Renames all sources and trees sensibly. Removes errant taxa, uninformative trees and empty sources.' |
2308 | 370 | ) | 375 | ) |
2309 | 371 | parser_cm.add_argument('input', | 376 | parser_cm.add_argument('input', |
2310 | 372 | help='The input phyml file') | 377 | help='The input phyml file') |
2311 | @@ -488,7 +493,81 @@ | |||
2312 | 488 | parser_cm.add_argument('subs', | 493 | parser_cm.add_argument('subs', |
2313 | 489 | help='The subs file') | 494 | help='The subs file') |
2314 | 490 | parser_cm.set_defaults(func=check_subs) | 495 | parser_cm.set_defaults(func=check_subs) |
2316 | 491 | 496 | ||
2317 | 497 | # taxonomic name checker | ||
2318 | 498 | parser_cm = subparsers.add_parser('check_otus', | ||
2319 | 499 | help='Check your OTUs against EoL.' | ||
2320 | 500 | ) | ||
2321 | 501 | parser_cm.add_argument('input', | ||
2322 | 502 | help='The input Phyml. Also accepts tree files or a simple list') | ||
2323 | 503 | parser_cm.add_argument('output', | ||
2324 | 504 | help='The output CSV file. Taxon, synonyms, status') | ||
2325 | 505 | parser_cm.add_argument('--overwrite', | ||
2326 | 506 | action='store_true', | ||
2327 | 507 | default=False, | ||
2328 | 508 | help="Overwrite the existing file without asking for confirmation") | ||
2329 | 509 | parser_cm.set_defaults(func=check_otus) | ||
2330 | 510 | |||
2331 | 511 | # create taxonomy csv file | ||
2332 | 512 | parser_cm = subparsers.add_parser('create_taxonomy', | ||
2333 | 513 | help='Create a taxonomy file in CSV for you to then augment.' | ||
2334 | 514 | ) | ||
2335 | 515 | parser_cm.add_argument('input', | ||
2336 | 516 | help='The input Phyml. Also accepts tree files or a simple list') | ||
2337 | 517 | parser_cm.add_argument('output', | ||
2338 | 518 | help='The output CSV file. Name, followed by classification and source') | ||
2339 | 519 | parser_cm.add_argument('--overwrite', | ||
2340 | 520 | action='store_true', | ||
2341 | 521 | default=False, | ||
2342 | 522 | help="Overwrite the existing file without asking for confirmation") | ||
2343 | 523 | parser_cm.add_argument('--taxonomy', | ||
2344 | 524 | help="Give a starting taxonomy file, e.g. one you ran earlier",) | ||
2345 | 525 | parser_cm.set_defaults(func=create_taxonomy) | ||
2346 | 526 | |||
2347 | 527 | |||
2348 | 528 | # do the subs in a one go using taxonomy | ||
2349 | 529 | parser_cm = subparsers.add_parser('auto_subs', | ||
2350 | 530 | help='Using a taxonomy, generate a species level version of your data in one go.' | ||
2351 | 531 | ) | ||
2352 | 532 | parser_cm.add_argument('input', | ||
2353 | 533 | help='The input Phyml') | ||
2354 | 534 | parser_cm.add_argument('taxonomy', | ||
2355 | 535 | help='Your taxonomy file', | ||
2356 | 536 | ) | ||
2357 | 537 | parser_cm.add_argument('output', | ||
2358 | 538 | help='The output phyml') | ||
2359 | 539 | parser_cm.add_argument('--overwrite', | ||
2360 | 540 | action='store_true', | ||
2361 | 541 | default=False, | ||
2362 | 542 | help="Overwrite the existing file without asking for confirmation") | ||
2363 | 543 | #parser_cm.add_argument('--level', | ||
2364 | 544 | # choices=supertree_toolkit.taxonomy_levels, | ||
2365 | 545 | # help="Taxonomic level to output at",) | ||
2366 | 546 | parser_cm.set_defaults(func=auto_subs) | ||
2367 | 547 | |||
2368 | 548 | |||
2369 | 549 | # attempt to process the data into a matrix all automatically | ||
2370 | 550 | parser_cm = subparsers.add_parser('process', | ||
2371 | 551 | help='Generate a species-level matrix, and do all the checks and processing automatically. Note this creates a taxonomy and does all the processing, but will not be perfect (as taxonomies are not perfect)' | ||
2372 | 552 | ) | ||
2373 | 553 | parser_cm.add_argument('input', | ||
2374 | 554 | help='The input Phyml') | ||
2375 | 555 | parser_cm.add_argument('output', | ||
2376 | 556 | help='The output matrix') | ||
2377 | 557 | parser_cm.add_argument('--taxonomy_file', | ||
2378 | 558 | help='Existing taxonomy file to prevent redownloading data. Any taxa not in the file will be checked online, so partial complete file are OK.') | ||
2379 | 559 | parser_cm.add_argument('--equivalents_file', | ||
2380 | 560 | help='Existing equivalents file from a taxonomic name check. Any taxa not in the file will be checked online, so partially complete files are OK.') | ||
2381 | 561 | parser_cm.add_argument('--overwrite', | ||
2382 | 562 | action='store_true', | ||
2383 | 563 | default=False, | ||
2384 | 564 | help="Overwrite the existing file without asking for confirmation") | ||
2385 | 565 | parser_cm.add_argument('--no_store', | ||
2386 | 566 | action="store_true", | ||
2387 | 567 | default=False, | ||
2388 | 568 | help="Do not store intermediate files -- not recommended") | ||
2389 | 569 | parser_cm.set_defaults(func=process) | ||
2390 | 570 | |||
2391 | 492 | 571 | ||
2392 | 493 | # before we let argparse work its magic, check for --version | 572 | # before we let argparse work its magic, check for --version |
2393 | 494 | if "--version" in sys.argv: | 573 | if "--version" in sys.argv: |
2394 | @@ -602,7 +681,7 @@ | |||
2395 | 602 | # check if output files are there | 681 | # check if output files are there |
2396 | 603 | if (output_file and os.path.exists(output_file) and not overwrite): | 682 | if (output_file and os.path.exists(output_file) and not overwrite): |
2397 | 604 | print "Output file exists. Either remove the file or use the --overwrite flag." | 683 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2399 | 605 | print "Do you wish to continue? [Y/n]" | 684 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2400 | 606 | while True: | 685 | while True: |
2401 | 607 | k=inkey() | 686 | k=inkey() |
2402 | 608 | if k.lower() == 'n': | 687 | if k.lower() == 'n': |
2403 | @@ -612,7 +691,7 @@ | |||
2404 | 612 | break | 691 | break |
2405 | 613 | if (not newphyml == None and os.path.exists(newphyml) and not overwrite): | 692 | if (not newphyml == None and os.path.exists(newphyml) and not overwrite): |
2406 | 614 | print "Output Phyml file exists. Either remove the file or use the --overwrite flag." | 693 | print "Output Phyml file exists. Either remove the file or use the --overwrite flag." |
2408 | 615 | print "Do you wish to continue? [Y/n]" | 694 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2409 | 616 | while True: | 695 | while True: |
2410 | 617 | k=inkey() | 696 | k=inkey() |
2411 | 618 | if k.lower() == 'n': | 697 | if k.lower() == 'n': |
2412 | @@ -624,9 +703,9 @@ | |||
2413 | 624 | XML = supertree_toolkit.load_phyml(input_file) | 703 | XML = supertree_toolkit.load_phyml(input_file) |
2414 | 625 | try: | 704 | try: |
2415 | 626 | if (newphyml == None): | 705 | if (newphyml == None): |
2417 | 627 | data_independence = supertree_toolkit.data_independence(XML,ignoreWarnings=ignoreWarnings) | 706 | data_independence, subsets = supertree_toolkit.data_independence(XML,ignoreWarnings=ignoreWarnings) |
2418 | 628 | else: | 707 | else: |
2420 | 629 | data_independence, new_phyml = supertree_toolkit.data_independence(XML,make_new_xml=True,ignoreWarnings=ignoreWarnings) | 708 | data_independence, subsets, new_phyml = supertree_toolkit.data_independence(XML,make_new_xml=True,ignoreWarnings=ignoreWarnings) |
2421 | 630 | except NotUniqueError as detail: | 709 | except NotUniqueError as detail: |
2422 | 631 | msg = "***Error: Failed to check independence.\n"+detail.msg | 710 | msg = "***Error: Failed to check independence.\n"+detail.msg |
2423 | 632 | print msg | 711 | print msg |
2424 | @@ -644,7 +723,7 @@ | |||
2425 | 644 | print msg | 723 | print msg |
2426 | 645 | return | 724 | return |
2427 | 646 | except: | 725 | except: |
2429 | 647 | msg = "***Error: failed to check independence due to unknown error." | 726 | msg = "***Error: failed to check independence due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" |
2430 | 648 | print msg | 727 | print msg |
2431 | 649 | traceback.print_exc() | 728 | traceback.print_exc() |
2432 | 650 | return | 729 | return |
2433 | @@ -653,16 +732,14 @@ | |||
2434 | 653 | data_ind = "" | 732 | data_ind = "" |
2435 | 654 | #column headers | 733 | #column headers |
2436 | 655 | data_ind = "Source trees that are subsets of others\n" | 734 | data_ind = "Source trees that are subsets of others\n" |
2441 | 656 | data_ind = data_ind + "Flagged tree, is a subset of:\n" | 735 | data_ind = data_ind + "Flagged tree(s), is/are subset(s) of:\n" |
2442 | 657 | for name in data_independence: | 736 | for names in subsets: |
2443 | 658 | if ( data_independence[name][1] == supertree_toolkit.SUBSET ): | 737 | data_ind += names[1:] + "," + names[0] + "\n" |
2440 | 659 | data_ind += name + "," + data_independence[name][0] + "\n" | ||
2444 | 660 | 738 | ||
2445 | 661 | data_ind += "\n\nSource trees that are identical to others\n" | 739 | data_ind += "\n\nSource trees that are identical to others\n" |
2450 | 662 | data_ind = data_ind + "Flagged tree, is identical to:\n" | 740 | data_ind = data_ind + "Flagged tree(s), is/are identical to:\n" |
2451 | 663 | for name in data_independence: | 741 | for names in data_independence: |
2452 | 664 | if ( data_independence[name][1] == supertree_toolkit.IDENTICAL ): | 742 | data_ind += names[1:] + "," + names[0] + "\n" |
2449 | 665 | data_ind += name + "," + data_independence[name][0] + "\n" | ||
2453 | 666 | 743 | ||
2454 | 667 | 744 | ||
2455 | 668 | if (output_file == False or | 745 | if (output_file == False or |
2456 | @@ -762,7 +839,7 @@ | |||
2457 | 762 | # Does the output file already exist? | 839 | # Does the output file already exist? |
2458 | 763 | if (os.path.exists(output_file) and not overwrite): | 840 | if (os.path.exists(output_file) and not overwrite): |
2459 | 764 | print "Output file exists. Either remove the file or use the --overwrite flag." | 841 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2461 | 765 | print "Do you wish to continue? [Y/n]" | 842 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2462 | 766 | while True: | 843 | while True: |
2463 | 767 | k=inkey() | 844 | k=inkey() |
2464 | 768 | if k.lower() == 'n': | 845 | if k.lower() == 'n': |
2465 | @@ -771,6 +848,7 @@ | |||
2466 | 771 | if k.lower() == 'y': | 848 | if k.lower() == 'y': |
2467 | 772 | break | 849 | break |
2468 | 773 | try: | 850 | try: |
2469 | 851 | |||
2470 | 774 | XML = supertree_toolkit.load_phyml(input_file) | 852 | XML = supertree_toolkit.load_phyml(input_file) |
2471 | 775 | input_is_xml = True | 853 | input_is_xml = True |
2472 | 776 | except: | 854 | except: |
2473 | @@ -896,7 +974,7 @@ | |||
2474 | 896 | # Does the output file already exist? | 974 | # Does the output file already exist? |
2475 | 897 | if (os.path.exists(output_file) and not overwrite): | 975 | if (os.path.exists(output_file) and not overwrite): |
2476 | 898 | print "Output file exists. Either remove the file or use the --overwrite flag." | 976 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2478 | 899 | print "Do you wish to continue? [Y/n]" | 977 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2479 | 900 | while True: | 978 | while True: |
2480 | 901 | k=inkey() | 979 | k=inkey() |
2481 | 902 | if k.lower() == 'n': | 980 | if k.lower() == 'n': |
2482 | @@ -942,7 +1020,7 @@ | |||
2483 | 942 | print msg | 1020 | print msg |
2484 | 943 | return | 1021 | return |
2485 | 944 | except: | 1022 | except: |
2487 | 945 | msg = "***Error: Failed sbstituting taxa due to unknown error.\n" | 1023 | msg = "***Error: Failed sbstituting taxa due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2488 | 946 | print msg | 1024 | print msg |
2489 | 947 | traceback.print_exc() | 1025 | traceback.print_exc() |
2490 | 948 | return | 1026 | return |
2491 | @@ -983,7 +1061,7 @@ | |||
2492 | 983 | 1061 | ||
2493 | 984 | if (os.path.exists(output_file) and not overwrite): | 1062 | if (os.path.exists(output_file) and not overwrite): |
2494 | 985 | print "Output file exists. Either remove the file or use the --overwrite flag." | 1063 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2496 | 986 | print "Do you wish to continue? [Y/n]" | 1064 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2497 | 987 | while True: | 1065 | while True: |
2498 | 988 | k=inkey() | 1066 | k=inkey() |
2499 | 989 | if k.lower() == 'n': | 1067 | if k.lower() == 'n': |
2500 | @@ -1013,7 +1091,7 @@ | |||
2501 | 1013 | print msg | 1091 | print msg |
2502 | 1014 | return | 1092 | return |
2503 | 1015 | except: | 1093 | except: |
2505 | 1016 | msg = "***Error: Failed sbstituting taxa due to unknown error.\n" | 1094 | msg = "***Error: Failed sbstituting taxa due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2506 | 1017 | print msg | 1095 | print msg |
2507 | 1018 | traceback.print_exc() | 1096 | traceback.print_exc() |
2508 | 1019 | return | 1097 | return |
2509 | @@ -1060,7 +1138,7 @@ | |||
2510 | 1060 | print msg | 1138 | print msg |
2511 | 1061 | return | 1139 | return |
2512 | 1062 | except: | 1140 | except: |
2514 | 1063 | msg = "***Error: Failed to export data due to unknown error.\n" | 1141 | msg = "***Error: Failed to export data due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2515 | 1064 | print msg | 1142 | print msg |
2516 | 1065 | traceback.print_exc() | 1143 | traceback.print_exc() |
2517 | 1066 | return | 1144 | return |
2518 | @@ -1115,7 +1193,7 @@ | |||
2519 | 1115 | print msg | 1193 | print msg |
2520 | 1116 | return | 1194 | return |
2521 | 1117 | except: | 1195 | except: |
2523 | 1118 | msg = "***Error: Failed to check overlap due to unknown error.\n" | 1196 | msg = "***Error: Failed to check overlap due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2524 | 1119 | print msg | 1197 | print msg |
2525 | 1120 | traceback.print_exc() | 1198 | traceback.print_exc() |
2526 | 1121 | return | 1199 | return |
2527 | @@ -1161,7 +1239,7 @@ | |||
2528 | 1161 | # check if output files are there | 1239 | # check if output files are there |
2529 | 1162 | if (output_file and os.path.exists(output_file) and not overwrite): | 1240 | if (output_file and os.path.exists(output_file) and not overwrite): |
2530 | 1163 | print "Output file exists. Either remove the file or use the --overwrite flag." | 1241 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2532 | 1164 | print "Do you wish to continue? [Y/n]" | 1242 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2533 | 1165 | while True: | 1243 | while True: |
2534 | 1166 | k=inkey() | 1244 | k=inkey() |
2535 | 1167 | if k.lower() == 'n': | 1245 | if k.lower() == 'n': |
2536 | @@ -1191,7 +1269,7 @@ | |||
2537 | 1191 | print msg | 1269 | print msg |
2538 | 1192 | return | 1270 | return |
2539 | 1193 | except: | 1271 | except: |
2541 | 1194 | msg = "***Error: Failed to export trees due to unknown error.\n" | 1272 | msg = "***Error: Failed to export trees due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2542 | 1195 | print msg | 1273 | print msg |
2543 | 1196 | traceback.print_exc() | 1274 | traceback.print_exc() |
2544 | 1197 | return | 1275 | return |
2545 | @@ -1220,7 +1298,7 @@ | |||
2546 | 1220 | # check if output files are there | 1298 | # check if output files are there |
2547 | 1221 | if (output_file and os.path.exists(output_file) and not overwrite): | 1299 | if (output_file and os.path.exists(output_file) and not overwrite): |
2548 | 1222 | print "Output file exists. Either remove the file or use the --overwrite flag." | 1300 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2550 | 1223 | print "Do you wish to continue? [Y/n]" | 1301 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2551 | 1224 | while True: | 1302 | while True: |
2552 | 1225 | k=inkey() | 1303 | k=inkey() |
2553 | 1226 | if k.lower() == 'n': | 1304 | if k.lower() == 'n': |
2554 | @@ -1309,7 +1387,7 @@ | |||
2555 | 1309 | print msg | 1387 | print msg |
2556 | 1310 | return | 1388 | return |
2557 | 1311 | except: | 1389 | except: |
2559 | 1312 | msg = "***Error: Failed to permute trees due to unknown error.\n" | 1390 | msg = "***Error: Failed to permute trees due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2560 | 1313 | print msg | 1391 | print msg |
2561 | 1314 | traceback.print_exc() | 1392 | traceback.print_exc() |
2562 | 1315 | return | 1393 | return |
2563 | @@ -1347,7 +1425,7 @@ | |||
2564 | 1347 | # check if output files are there | 1425 | # check if output files are there |
2565 | 1348 | if (os.path.exists(output_file) and not overwrite): | 1426 | if (os.path.exists(output_file) and not overwrite): |
2566 | 1349 | print "Output file exists. Either remove the file or use the --overwrite flag." | 1427 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2568 | 1350 | print "Do you wish to continue? [Y/n]" | 1428 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2569 | 1351 | while True: | 1429 | while True: |
2570 | 1352 | k=inkey() | 1430 | k=inkey() |
2571 | 1353 | if k.lower() == 'n': | 1431 | if k.lower() == 'n': |
2572 | @@ -1376,7 +1454,7 @@ | |||
2573 | 1376 | print msg | 1454 | print msg |
2574 | 1377 | return | 1455 | return |
2575 | 1378 | except: | 1456 | except: |
2577 | 1379 | msg = "***Error: Failed to clean data due to unknown error.\n" | 1457 | msg = "***Error: Failed to clean data due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2578 | 1380 | print msg | 1458 | print msg |
2579 | 1381 | traceback.print_exc() | 1459 | traceback.print_exc() |
2580 | 1382 | return | 1460 | return |
2581 | @@ -1404,7 +1482,7 @@ | |||
2582 | 1404 | # check if output files are there | 1482 | # check if output files are there |
2583 | 1405 | if (os.path.exists(output_file) and not overwrite): | 1483 | if (os.path.exists(output_file) and not overwrite): |
2584 | 1406 | print "Output file exists. Either remove the file or use the --overwrite flag." | 1484 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2586 | 1407 | print "Do you wish to continue? [Y/n]" | 1485 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2587 | 1408 | while True: | 1486 | while True: |
2588 | 1409 | k=inkey() | 1487 | k=inkey() |
2589 | 1410 | if k.lower() == 'n': | 1488 | if k.lower() == 'n': |
2590 | @@ -1433,7 +1511,7 @@ | |||
2591 | 1433 | print msg | 1511 | print msg |
2592 | 1434 | return | 1512 | return |
2593 | 1435 | except: | 1513 | except: |
2595 | 1436 | msg = "***Error: Failed to replace genera due to unknown error.\n" | 1514 | msg = "***Error: Failed to replace genera due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2596 | 1437 | print msg | 1515 | print msg |
2597 | 1438 | traceback.print_exc() | 1516 | traceback.print_exc() |
2598 | 1439 | return | 1517 | return |
2599 | @@ -1488,7 +1566,7 @@ | |||
2600 | 1488 | new_trees = {} | 1566 | new_trees = {} |
2601 | 1489 | i = 1 | 1567 | i = 1 |
2602 | 1490 | for t in trees: | 1568 | for t in trees: |
2604 | 1491 | new_trees['tree_'+str(i)] = t | 1569 | new_trees['tree_'+str(i)] = supertree_toolkit._collapse_nodes(t) |
2605 | 1492 | i += 1 | 1570 | i += 1 |
2606 | 1493 | output = supertree_toolkit._amalgamate_trees(new_trees,format=output_format) | 1571 | output = supertree_toolkit._amalgamate_trees(new_trees,format=output_format) |
2607 | 1494 | except TreeParseError as detail: | 1572 | except TreeParseError as detail: |
2608 | @@ -1503,7 +1581,7 @@ | |||
2609 | 1503 | # check if output files are there | 1581 | # check if output files are there |
2610 | 1504 | if (os.path.exists(output_file) and not overwrite): | 1582 | if (os.path.exists(output_file) and not overwrite): |
2611 | 1505 | print "Output file exists. Either remove the file or use the --overwrite flag." | 1583 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2613 | 1506 | print "Do you wish to continue? [Y/n]" | 1584 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2614 | 1507 | while True: | 1585 | while True: |
2615 | 1508 | k=inkey() | 1586 | k=inkey() |
2616 | 1509 | if k.lower() == 'n': | 1587 | if k.lower() == 'n': |
2617 | @@ -1540,7 +1618,7 @@ | |||
2618 | 1540 | # check if output files are there | 1618 | # check if output files are there |
2619 | 1541 | if (os.path.exists(output_file) and not overwrite): | 1619 | if (os.path.exists(output_file) and not overwrite): |
2620 | 1542 | print "Output file exists. Either remove the file or use the --overwrite flag." | 1620 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2622 | 1543 | print "Do you wish to continue? [Y/n]" | 1621 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2623 | 1544 | while True: | 1622 | while True: |
2624 | 1545 | k=inkey() | 1623 | k=inkey() |
2625 | 1546 | if k.lower() == 'n': | 1624 | if k.lower() == 'n': |
2626 | @@ -1589,7 +1667,7 @@ | |||
2627 | 1589 | print msg | 1667 | print msg |
2628 | 1590 | return | 1668 | return |
2629 | 1591 | except: | 1669 | except: |
2631 | 1592 | msg = "***Error: Failed to create subset due to unknown error.\n" | 1670 | msg = "***Error: Failed to create subset due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2632 | 1593 | print msg | 1671 | print msg |
2633 | 1594 | traceback.print_exc() | 1672 | traceback.print_exc() |
2634 | 1595 | return | 1673 | return |
2635 | @@ -1637,6 +1715,681 @@ | |||
2636 | 1637 | print "**************************************************************\n" | 1715 | print "**************************************************************\n" |
2637 | 1638 | 1716 | ||
2638 | 1639 | 1717 | ||
2639 | 1718 | def check_otus(args): | ||
2640 | 1719 | """check out the OTUs in the Phyml - are they considered valid?""" | ||
2641 | 1720 | |||
2642 | 1721 | verbose = args.verbose | ||
2643 | 1722 | input_file = args.input | ||
2644 | 1723 | output_file = args.output | ||
2645 | 1724 | |||
2646 | 1725 | print input_file | ||
2647 | 1726 | if (input_file.endswith(".phyml")): | ||
2648 | 1727 | XML = supertree_toolkit.load_phyml(input_file) | ||
2649 | 1728 | try: | ||
2650 | 1729 | equivs = supertree_toolkit.taxonomic_checker(XML, verbose=verbose) | ||
2651 | 1730 | except NotUniqueError as detail: | ||
2652 | 1731 | msg = "***Error: Failed to check OTUs.\n"+detail.msg | ||
2653 | 1732 | print msg | ||
2654 | 1733 | return | ||
2655 | 1734 | except InvalidSTKData as detail: | ||
2656 | 1735 | msg = "***Error: Failed to check OTUs.\n"+detail.msg | ||
2657 | 1736 | print msg | ||
2658 | 1737 | return | ||
2659 | 1738 | except UninformativeTreeError as detail: | ||
2660 | 1739 | msg = "***Error: Failed to check OTUs.\n"+detail.msg | ||
2661 | 1740 | print msg | ||
2662 | 1741 | return | ||
2663 | 1742 | except TreeParseError as detail: | ||
2664 | 1743 | msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg | ||
2665 | 1744 | print msg | ||
2666 | 1745 | return | ||
2667 | 1746 | except: | ||
2668 | 1747 | # what about no internet conenction? What error do that throw? | ||
2669 | 1748 | msg = "***Error: failed to create OTUs due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" | ||
2670 | 1749 | print msg | ||
2671 | 1750 | traceback.print_exc() | ||
2672 | 1751 | return | ||
2673 | 1752 | elif (input_file.endswith(".txt") or input_file.endswith('.dat')): | ||
2674 | 1753 | # read file - assume one taxa per line | ||
2675 | 1754 | with open(input_file,'r') as f: | ||
2676 | 1755 | lines = f.read().splitlines() | ||
2677 | 1756 | equivs = supertree_toolkit.taxonomic_checker_list(lines, verbose=verbose) | ||
2678 | 1757 | else: | ||
2679 | 1758 | # assume a tree! | ||
2680 | 1759 | equivs = supertree_toolkit.taxonomic_checker_tree(input_file, verbose=verbose) | ||
2681 | 1760 | |||
2682 | 1761 | |||
2683 | 1762 | |||
2684 | 1763 | f = open(output_file,"w") | ||
2685 | 1764 | for taxon in sorted(equivs.keys()): | ||
2686 | 1765 | f.write(taxon+","+";".join(equivs[taxon][0])+","+equivs[taxon][1]+"\n") | ||
2687 | 1766 | f.close() | ||
2688 | 1767 | |||
2689 | 1768 | |||
2690 | 1769 | |||
2691 | 1770 | def create_taxonomy(args): | ||
2692 | 1771 | """create a taxonomic heirachy for each OTU in the Phyml""" | ||
2693 | 1772 | |||
2694 | 1773 | verbose = args.verbose | ||
2695 | 1774 | input_file = args.input | ||
2696 | 1775 | output_file = args.output | ||
2697 | 1776 | existing_taxonomy = args.taxonomy | ||
2698 | 1777 | ignoreWarnings = args.ignoreWarnings | ||
2699 | 1778 | |||
2700 | 1779 | XML = supertree_toolkit.load_phyml(input_file) | ||
2701 | 1780 | if (not existing_taxonomy == None): | ||
2702 | 1781 | existing_taxonomy = supertree_toolkit.load_taxonomy(existing_taxonomy) # load it in and create the dictionary | ||
2703 | 1782 | pass | ||
2704 | 1783 | |||
2705 | 1784 | try: | ||
2706 | 1785 | taxonomy = supertree_toolkit.create_taxonomy(XML,existing_taxonomy=existing_taxonomy,verbose=verbose,ignoreWarnings=ignoreWarnings) | ||
2707 | 1786 | except NotUniqueError as detail: | ||
2708 | 1787 | msg = "***Error: Failed to create taxonomy.\n"+detail.msg | ||
2709 | 1788 | print msg | ||
2710 | 1789 | return | ||
2711 | 1790 | except InvalidSTKData as detail: | ||
2712 | 1791 | msg = "***Error: Failed to create taxonomy.\n"+detail.msg | ||
2713 | 1792 | print msg | ||
2714 | 1793 | return | ||
2715 | 1794 | except UninformativeTreeError as detail: | ||
2716 | 1795 | msg = "***Error: Failed to create taxonomy.\n"+detail.msg | ||
2717 | 1796 | print msg | ||
2718 | 1797 | return | ||
2719 | 1798 | except TreeParseError as detail: | ||
2720 | 1799 | msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg | ||
2721 | 1800 | print msg | ||
2722 | 1801 | return | ||
2723 | 1802 | except: | ||
2724 | 1803 | # what about no internet conenction? What error do that throw? | ||
2725 | 1804 | msg = "***Error: failed to create taxonomy due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" | ||
2726 | 1805 | print msg | ||
2727 | 1806 | traceback.print_exc() | ||
2728 | 1807 | return | ||
2729 | 1808 | |||
2730 | 1809 | # Now create the CSV output | ||
2731 | 1810 | with open(output_file, 'w') as f: | ||
2732 | 1811 | writer = csv.writer(f) | ||
2733 | 1812 | headers = [] | ||
2734 | 1813 | headers.append("OTU") | ||
2735 | 1814 | headers.extend(supertree_toolkit.taxonomy_levels) | ||
2736 | 1815 | headers.append("Data source") | ||
2737 | 1816 | writer.writerow(headers) | ||
2738 | 1817 | for t in taxonomy: | ||
2739 | 1818 | otu = t | ||
2740 | 1819 | try: | ||
2741 | 1820 | species = taxonomy[t]['species'] | ||
2742 | 1821 | except KeyError: | ||
2743 | 1822 | species = "-" | ||
2744 | 1823 | try: | ||
2745 | 1824 | genus = taxonomy[t]['genus'] | ||
2746 | 1825 | except KeyError: | ||
2747 | 1826 | genus = "-" | ||
2748 | 1827 | try: | ||
2749 | 1828 | family = taxonomy[t]['family'] | ||
2750 | 1829 | except KeyError: | ||
2751 | 1830 | family = "-" | ||
2752 | 1831 | try: | ||
2753 | 1832 | superfamily = taxonomy[t]['superfamily'] | ||
2754 | 1833 | except KeyError: | ||
2755 | 1834 | superfamily = "-" | ||
2756 | 1835 | try: | ||
2757 | 1836 | infraorder = taxonomy[t]['infraorder'] | ||
2758 | 1837 | except KeyError: | ||
2759 | 1838 | infraorder = "-" | ||
2760 | 1839 | try: | ||
2761 | 1840 | suborder = taxonomy[t]['suborder'] | ||
2762 | 1841 | except KeyError: | ||
2763 | 1842 | suborder = "-" | ||
2764 | 1843 | try: | ||
2765 | 1844 | order = taxonomy[t]['order'] | ||
2766 | 1845 | except KeyError: | ||
2767 | 1846 | order = "-" | ||
2768 | 1847 | try: | ||
2769 | 1848 | superorder = taxonomy[t]['superorder'] | ||
2770 | 1849 | except KeyError: | ||
2771 | 1850 | superorder = "-" | ||
2772 | 1851 | try: | ||
2773 | 1852 | subclass = taxonomy[t]['subclass'] | ||
2774 | 1853 | except KeyError: | ||
2775 | 1854 | subclass = "-" | ||
2776 | 1855 | try: | ||
2777 | 1856 | tclass = taxonomy[t]['class'] | ||
2778 | 1857 | except KeyError: | ||
2779 | 1858 | tclass = "-" | ||
2780 | 1859 | try: | ||
2781 | 1860 | subphylum = taxonomy[t]['subphylum'] | ||
2782 | 1861 | except KeyError: | ||
2783 | 1862 | subphylum = "-" | ||
2784 | 1863 | try: | ||
2785 | 1864 | phylum = taxonomy[t]['phylum'] | ||
2786 | 1865 | except KeyError: | ||
2787 | 1866 | phylum = "-" | ||
2788 | 1867 | try: | ||
2789 | 1868 | superphylum = taxonomy[t]['superphylum'] | ||
2790 | 1869 | except KeyError: | ||
2791 | 1870 | superphylum = "-" | ||
2792 | 1871 | try: | ||
2793 | 1872 | infrakingdom = taxonomy[t]['infrakingdom'] | ||
2794 | 1873 | except: | ||
2795 | 1874 | infrakingdom = "-" | ||
2796 | 1875 | try: | ||
2797 | 1876 | subkingdom = taxonomy[t]['subkingdom'] | ||
2798 | 1877 | except: | ||
2799 | 1878 | subkingdom = "-" | ||
2800 | 1879 | try: | ||
2801 | 1880 | kingdom = taxonomy[t]['kingdom'] | ||
2802 | 1881 | except KeyError: | ||
2803 | 1882 | kingdom = "-" | ||
2804 | 1883 | try: | ||
2805 | 1884 | provider = taxonomy[t]['provider'] | ||
2806 | 1885 | except KeyError: | ||
2807 | 1886 | provider = "-" | ||
2808 | 1887 | |||
2809 | 1888 | if (isinstance(species, list)): | ||
2810 | 1889 | species = " ".join(species) | ||
2811 | 1890 | this_classification = [ | ||
2812 | 1891 | otu.encode('utf-8'), | ||
2813 | 1892 | species.encode('utf-8'), | ||
2814 | 1893 | genus.encode('utf-8'), | ||
2815 | 1894 | family.encode('utf-8'), | ||
2816 | 1895 | superfamily.encode('utf-8'), | ||
2817 | 1896 | infraorder.encode('utf-8'), | ||
2818 | 1897 | suborder.encode('utf-8'), | ||
2819 | 1898 | order.encode('utf-8'), | ||
2820 | 1899 | superorder.encode('utf-8'), | ||
2821 | 1900 | subclass.encode('utf-8'), | ||
2822 | 1901 | tclass.encode('utf-8'), | ||
2823 | 1902 | subphylum.encode('utf-8'), | ||
2824 | 1903 | phylum.encode('utf-8'), | ||
2825 | 1904 | superphylum.encode('utf-8'), | ||
2826 | 1905 | infrakingdom.encode('utf-8'), | ||
2827 | 1906 | subkingdom.encode('utf-8'), | ||
2828 | 1907 | kingdom.encode('utf-8'), | ||
2829 | 1908 | provider.encode('utf-8')] | ||
2830 | 1909 | writer.writerow(this_classification) | ||
2831 | 1910 | |||
2832 | 1911 | def auto_subs(args): | ||
2833 | 1912 | """Get all OTUs to the same taxonomic level""" | ||
2834 | 1913 | |||
2835 | 1914 | |||
2836 | 1915 | verbose = args.verbose | ||
2837 | 1916 | input_file = args.input | ||
2838 | 1917 | output = args.output | ||
2839 | 1918 | taxonomy = args.taxonomy | ||
2840 | 1919 | ignoreWarnings = args.ignoreWarnings | ||
2841 | 1920 | |||
2842 | 1921 | if (os.path.exists(output) and not overwrite): | ||
2843 | 1922 | print "Output Phyml file exists. Either remove the file or use the --overwrite flag." | ||
2844 | 1923 | print "Do you wish to continue and overwrite the file anyway?? [Y/n]" | ||
2845 | 1924 | while True: | ||
2846 | 1925 | k=inkey() | ||
2847 | 1926 | if k.lower() == 'n': | ||
2848 | 1927 | print "Exiting..." | ||
2849 | 1928 | sys.exit(0) | ||
2850 | 1929 | if k.lower() == 'y': | ||
2851 | 1930 | break | ||
2852 | 1931 | |||
2853 | 1932 | XML = supertree_toolkit.load_phyml(input_file) | ||
2854 | 1933 | taxonomy = supertree_toolkit.load_taxonomy(taxonomy) # load it in and create the dictionary | ||
2855 | 1934 | |||
2856 | 1935 | try: | ||
2857 | 1936 | newXML = supertree_toolkit.generate_species_level_data(XML,taxonomy,verbose=verbose,ignoreWarnings=ignoreWarnings) | ||
2858 | 1937 | except NotUniqueError as detail: | ||
2859 | 1938 | msg = "***Error: Failed to carry out auto subs.\n"+detail.msg | ||
2860 | 1939 | print msg | ||
2861 | 1940 | return | ||
2862 | 1941 | except InvalidSTKData as detail: | ||
2863 | 1942 | msg = "***Error: Failed to carry out auto subs.\n"+detail.msg | ||
2864 | 1943 | print msg | ||
2865 | 1944 | return | ||
2866 | 1945 | except UninformativeTreeError as detail: | ||
2867 | 1946 | msg = "***Error: Failed to carry out auto subs.\n"+detail.msg | ||
2868 | 1947 | print msg | ||
2869 | 1948 | return | ||
2870 | 1949 | except TreeParseError as detail: | ||
2871 | 1950 | msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg | ||
2872 | 1951 | print msg | ||
2873 | 1952 | return | ||
2874 | 1953 | except NoneCompleteTaxonomy as detail: | ||
2875 | 1954 | msg = "***Error: Failed to carry out auto subs.\n"+detail.msg | ||
2876 | 1955 | print msg | ||
2877 | 1956 | return | ||
2878 | 1957 | except: | ||
2879 | 1958 | # what about no internet conenction? What error do that throw? | ||
2880 | 1959 | msg = "***Error: failed to carry out auto subs due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" | ||
2881 | 1960 | print msg | ||
2882 | 1961 | traceback.print_exc() | ||
2883 | 1962 | return | ||
2884 | 1963 | |||
2885 | 1964 | f = open(output,"w") | ||
2886 | 1965 | f.write(newXML) | ||
2887 | 1966 | f.close() | ||
2888 | 1967 | |||
2889 | 1968 | def process(args): | ||
2890 | 1969 | |||
2891 | 1970 | verbose = args.verbose | ||
2892 | 1971 | input_file = args.input | ||
2893 | 1972 | output = args.output | ||
2894 | 1973 | no_store = args.no_store | ||
2895 | 1974 | ignoreWarnings = args.ignoreWarnings | ||
2896 | 1975 | taxonomy_file = args.taxonomy_file | ||
2897 | 1976 | equivalents_file = args.equivalents_file | ||
2898 | 1977 | overwrite = args.overwrite | ||
2899 | 1978 | |||
2900 | 1979 | if (os.path.exists(output) and not overwrite): | ||
2901 | 1980 | print "Output matrix file exists. Either remove the file or use the --overwrite flag." | ||
2902 | 1981 | print "Do you wish to continue and overwrite the file anyway? [Y/n]" | ||
2903 | 1982 | while True: | ||
2904 | 1983 | k=inkey() | ||
2905 | 1984 | if k.lower() == 'n': | ||
2906 | 1985 | print "Exiting..." | ||
2907 | 1986 | sys.exit(0) | ||
2908 | 1987 | if k.lower() == 'y': | ||
2909 | 1988 | break | ||
2910 | 1989 | |||
2911 | 1990 | filename = os.path.basename(input_file) | ||
2912 | 1991 | dirname = os.path.dirname(input_file) | ||
2913 | 1992 | |||
2914 | 1993 | if verbose: | ||
2915 | 1994 | print "Loading and checking your data" | ||
2916 | 1995 | # 0) load and check data | ||
2917 | 1996 | try: | ||
2918 | 1997 | phyml = supertree_toolkit.load_phyml(input_file) | ||
2919 | 1998 | project_name = supertree_toolkit.get_project_name(phyml) | ||
2920 | 1999 | supertree_toolkit._check_data(phyml) | ||
2921 | 2000 | except NotUniqueError as detail: | ||
2922 | 2001 | msg = "***Error: Failed to load data.\n"+detail.msg | ||
2923 | 2002 | print msg | ||
2924 | 2003 | return | ||
2925 | 2004 | except InvalidSTKData as detail: | ||
2926 | 2005 | msg = "***Error: Failed to load data.\n"+detail.msg | ||
2927 | 2006 | print msg | ||
2928 | 2007 | return | ||
2929 | 2008 | except UninformativeTreeError as detail: | ||
2930 | 2009 | msg = "***Error: Failed to load data.\n"+detail.msg | ||
2931 | 2010 | print msg | ||
2932 | 2011 | return | ||
2933 | 2012 | except TreeParseError as detail: | ||
2934 | 2013 | msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg | ||
2935 | 2014 | print msg | ||
2936 | 2015 | return | ||
2937 | 2016 | except: | ||
2938 | 2017 | msg = "***Error: Failed to load input due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" | ||
2939 | 2018 | print msg | ||
2940 | 2019 | traceback.print_exc() | ||
2941 | 2020 | return | ||
2942 | 2021 | |||
2943 | 2022 | if verbose: | ||
2944 | 2023 | print "Checking taxa againt online databases" | ||
2945 | 2024 | # 1) taxonomy checker with autoreplace | ||
2946 | 2025 | # Load existing data if any: | ||
2947 | 2026 | if (not equivalents_file == None): | ||
2948 | 2027 | equivalents = supertree_toolkit.load_equivalents(equivalents_file) | ||
2949 | 2028 | else: | ||
2950 | 2029 | equivalents = None | ||
2951 | 2030 | equivalents = supertree_toolkit.taxonomic_checker(phyml,existing_data=equivalents,verbose=verbose) | ||
2952 | 2031 | # save the equivalents for later (as CSV and as sub file) | ||
2953 | 2032 | data_string_csv = _equivalents_to_csv(equivalents) | ||
2954 | 2033 | data_string_subs = _equivalents_to_subs(equivalents) | ||
2955 | 2034 | f = open(os.path.join(dirname,project_name+"_taxonomy_checker.csv"), "w") | ||
2956 | 2035 | f.write(data_string_csv) | ||
2957 | 2036 | f.close() | ||
2958 | 2037 | f = open(os.path.join(dirname,project_name+"_taxonomy_check_subs.dat"), "w") | ||
2959 | 2038 | f.write(data_string_subs) | ||
2960 | 2039 | f.close() | ||
2961 | 2040 | |||
2962 | 2041 | # now do the replacements - we use the subs file :) | ||
2963 | 2042 | if verbose: | ||
2964 | 2043 | print "Swapping in the corrected taxa names" | ||
2965 | 2044 | try: | ||
2966 | 2045 | old_taxa, new_taxa = supertree_toolkit.parse_subs_file(os.path.join(dirname,project_name+"_taxonomy_check_subs.dat")) | ||
2967 | 2046 | except UnableToParseSubsFile as e: | ||
2968 | 2047 | print e.msg | ||
2969 | 2048 | sys.exit(-1) | ||
2970 | 2049 | try: | ||
2971 | 2050 | phyml = supertree_toolkit.substitute_taxa(phyml,old_taxa,new_taxa,only_existing=False,verbose=verbose) | ||
2972 | 2051 | except NotUniqueError as detail: | ||
2973 | 2052 | msg = "***Error: Failed to substituting taxa.\n"+detail.msg | ||
2974 | 2053 | print msg | ||
2975 | 2054 | return | ||
2976 | 2055 | except InvalidSTKData as detail: | ||
2977 | 2056 | msg = "***Error: Failed substituting taxa.\n"+detail.msg | ||
2978 | 2057 | print msg | ||
2979 | 2058 | return | ||
2980 | 2059 | except UninformativeTreeError as detail: | ||
2981 | 2060 | msg = "***Error: Failed to substituting taxa.\n"+detail.msg | ||
2982 | 2061 | print msg | ||
2983 | 2062 | return | ||
2984 | 2063 | except TreeParseError as detail: | ||
2985 | 2064 | msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg | ||
2986 | 2065 | print msg | ||
2987 | 2066 | return | ||
2988 | 2067 | except: | ||
2989 | 2068 | msg = "***Error: Failed sbstituting taxa due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" | ||
2990 | 2069 | print msg | ||
2991 | 2070 | traceback.print_exc() | ||
2992 | 2071 | return | ||
2993 | 2072 | # save phyml as intermediate step | ||
2994 | 2073 | f = open(os.path.join(dirname,project_name+"_taxonomy_checked.phyml"), "w") | ||
2995 | 2074 | f.write(phyml) | ||
2996 | 2075 | f.close() | ||
2997 | 2076 | |||
2998 | 2077 | |||
2999 | 2078 | if verbose: | ||
3000 | 2079 | print "Creating taxonomic information" | ||
3001 | 2080 | # 2) create taxonomy | ||
3002 | 2081 | if (not taxonomy_file == None): | ||
3003 | 2082 | taxonomy = supertree_toolkit.load_taxonomy(taxonomy_file) | ||
3004 | 2083 | else: | ||
3005 | 2084 | taxonomy = None | ||
3006 | 2085 | taxonomy = supertree_toolkit.create_taxonomy(phyml,existing_taxonomy=taxonomy,verbose=verbose) | ||
3007 | 2086 | # save the taxonomy for later | ||
3008 | 2087 | # Now create the CSV output - seperate out into function in STK (used several times) | ||
3009 | 2088 | with open(os.path.join(dirname,project_name+"_taxonomy.csv"), 'w') as f: | ||
3010 | 2089 | writer = csv.writer(f) | ||
3011 | 2090 | headers = [] | ||
3012 | 2091 | headers.append("OTU") | ||
3013 | 2092 | headers.extend(supertree_toolkit.taxonomy_levels) | ||
3014 | 2093 | headers.append("Data source") | ||
3015 | 2094 | writer.writerow(headers) | ||
3016 | 2095 | for t in taxonomy: | ||
3017 | 2096 | otu = t | ||
3018 | 2097 | try: | ||
3019 | 2098 | species = taxonomy[t]['species'] | ||
3020 | 2099 | except KeyError: | ||
3021 | 2100 | species = "-" | ||
3022 | 2101 | try: | ||
3023 | 2102 | subgenus = taxonomy[t]['subgenus'] | ||
3024 | 2103 | except KeyError: | ||
3025 | 2104 | subgenus = "-" | ||
3026 | 2105 | try: | ||
3027 | 2106 | genus = taxonomy[t]['genus'] | ||
3028 | 2107 | except KeyError: | ||
3029 | 2108 | genus = "-" | ||
3030 | 2109 | try: | ||
3031 | 2110 | subfamily = taxonomy[t]['subfamily'] | ||
3032 | 2111 | except KeyError: | ||
3033 | 2112 | subfamily = "-" | ||
3034 | 2113 | try: | ||
3035 | 2114 | family = taxonomy[t]['family'] | ||
3036 | 2115 | except KeyError: | ||
3037 | 2116 | family = "-" | ||
3038 | 2117 | try: | ||
3039 | 2118 | superfamily = taxonomy[t]['superfamily'] | ||
3040 | 2119 | except KeyError: | ||
3041 | 2120 | superfamily = "-" | ||
3042 | 2121 | try: | ||
3043 | 2122 | subsection = taxonomy[t]['subsection'] | ||
3044 | 2123 | except KeyError: | ||
3045 | 2124 | subsection = "-" | ||
3046 | 2125 | try: | ||
3047 | 2126 | section = taxonomy[t]['section'] | ||
3048 | 2127 | except KeyError: | ||
3049 | 2128 | section = "-" | ||
3050 | 2129 | try: | ||
3051 | 2130 | infraorder = taxonomy[t]['infraorder'] | ||
3052 | 2131 | except KeyError: | ||
3053 | 2132 | infraorder = "-" | ||
3054 | 2133 | try: | ||
3055 | 2134 | suborder = taxonomy[t]['suborder'] | ||
3056 | 2135 | except KeyError: | ||
3057 | 2136 | suborder = "-" | ||
3058 | 2137 | try: | ||
3059 | 2138 | order = taxonomy[t]['order'] | ||
3060 | 2139 | except KeyError: | ||
3061 | 2140 | order = "-" | ||
3062 | 2141 | try: | ||
3063 | 2142 | superorder = taxonomy[t]['superorder'] | ||
3064 | 2143 | except KeyError: | ||
3065 | 2144 | superorder = "-" | ||
3066 | 2145 | try: | ||
3067 | 2146 | subclass = taxonomy[t]['subclass'] | ||
3068 | 2147 | except KeyError: | ||
3069 | 2148 | subclass = "-" | ||
3070 | 2149 | try: | ||
3071 | 2150 | tclass = taxonomy[t]['class'] | ||
3072 | 2151 | except KeyError: | ||
3073 | 2152 | tclass = "-" | ||
3074 | 2153 | try: | ||
3075 | 2154 | superclass = taxonomy[t]['superclass'] | ||
3076 | 2155 | except KeyError: | ||
3077 | 2156 | superclass = "-" | ||
3078 | 2157 | try: | ||
3079 | 2158 | subphylum = taxonomy[t]['subphylum'] | ||
3080 | 2159 | except KeyError: | ||
3081 | 2160 | subphylum = "-" | ||
3082 | 2161 | try: | ||
3083 | 2162 | phylum = taxonomy[t]['phylum'] | ||
3084 | 2163 | except KeyError: | ||
3085 | 2164 | phylum = "-" | ||
3086 | 2165 | try: | ||
3087 | 2166 | superphylum = taxonomy[t]['superphylum'] | ||
3088 | 2167 | except KeyError: | ||
3089 | 2168 | superphylum = "-" | ||
3090 | 2169 | try: | ||
3091 | 2170 | infrakingdom = taxonomy[t]['infrakingdom'] | ||
3092 | 2171 | except: | ||
3093 | 2172 | infrakingdom = "-" | ||
3094 | 2173 | try: | ||
3095 | 2174 | subkingdom = taxonomy[t]['subkingdom'] | ||
3096 | 2175 | except: | ||
3097 | 2176 | subkingdom = "-" | ||
3098 | 2177 | try: | ||
3099 | 2178 | kingdom = taxonomy[t]['kingdom'] | ||
3100 | 2179 | except KeyError: | ||
3101 | 2180 | kingdom = "-" | ||
3102 | 2181 | try: | ||
3103 | 2182 | provider = taxonomy[t]['provider'] | ||
3104 | 2183 | except KeyError: | ||
3105 | 2184 | provider = "-" | ||
3106 | 2185 | this_classification = [ | ||
3107 | 2186 | otu.encode('utf-8'), | ||
3108 | 2187 | species.encode('utf-8'), | ||
3109 | 2188 | subgenus.encode('utf-8'), | ||
3110 | 2189 | genus.encode('utf-8'), | ||
3111 | 2190 | subfamily.encode('utf-8'), | ||
3112 | 2191 | family.encode('utf-8'), | ||
3113 | 2192 | superfamily.encode('utf-8'), | ||
3114 | 2193 | subsection.encode('utf-8'), | ||
3115 | 2194 | section.encode('utf-8'), | ||
3116 | 2195 | infraorder.encode('utf-8'), | ||
3117 | 2196 | suborder.encode('utf-8'), | ||
3118 | 2197 | order.encode('utf-8'), | ||
3119 | 2198 | superorder.encode('utf-8'), | ||
3120 | 2199 | subclass.encode('utf-8'), | ||
3121 | 2200 | tclass.encode('utf-8'), | ||
3122 | 2201 | superclass.encode('utf-8'), | ||
3123 | 2202 | subphylum.encode('utf-8'), | ||
3124 | 2203 | phylum.encode('utf-8'), | ||
3125 | 2204 | superphylum.encode('utf-8'), | ||
3126 | 2205 | infrakingdom.encode('utf-8'), | ||
3127 | 2206 | subkingdom.encode('utf-8'), | ||
3128 | 2207 | kingdom.encode('utf-8'), | ||
3129 | 2208 | provider.encode('utf-8')] | ||
3130 | 2209 | writer.writerow(this_classification) | ||
3131 | 2210 | |||
3132 | 2211 | # 3) create species level dataset | ||
3133 | 2212 | if verbose: | ||
3134 | 2213 | print "Converting data to species level" | ||
3135 | 2214 | try: | ||
3136 | 2215 | phyml = supertree_toolkit.generate_species_level_data(phyml,taxonomy,verbose=verbose) | ||
3137 | 2216 | except NotUniqueError as detail: | ||
3138 | 2217 | msg = "***Error: Failed to carry out auto subs.\n"+detail.msg | ||
3139 | 2218 | print msg | ||
3140 | 2219 | return | ||
3141 | 2220 | except InvalidSTKData as detail: | ||
3142 | 2221 | msg = "***Error: Failed to carry out auto subs.\n"+detail.msg | ||
3143 | 2222 | print msg | ||
3144 | 2223 | return | ||
3145 | 2224 | except UninformativeTreeError as detail: | ||
3146 | 2225 | msg = "***Error: Failed to carry out auto subs.\n"+detail.msg | ||
3147 | 2226 | print msg | ||
3148 | 2227 | return | ||
3149 | 2228 | except TreeParseError as detail: | ||
3150 | 2229 | msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg | ||
3151 | 2230 | print msg | ||
3152 | 2231 | return | ||
3153 | 2232 | except NoneCompleteTaxonomy as detail: | ||
3154 | 2233 | msg = "***Error: Failed to carry out auto subs.\n"+detail.msg | ||
3155 | 2234 | print msg | ||
3156 | 2235 | return | ||
3157 | 2236 | except: | ||
3158 | 2237 | # what about no internet conenction? What error do that throw? | ||
3159 | 2238 | msg = "***Error: failed to carry out auto subs due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" | ||
3160 | 2239 | print msg | ||
3161 | 2240 | traceback.print_exc() | ||
3162 | 2241 | return | ||
3163 | 2242 | # save the phyml as intermediate step | ||
3164 | 2243 | f = open(os.path.join(dirname,project_name+"_species_level.phyml"), "w") | ||
3165 | 2244 | f.write(phyml) | ||
3166 | 2245 | f.close() | ||
3167 | 2246 | |||
3168 | 2247 | # 4) Remove non-monophyletic taxa (requires TNT to be installed) | ||
3169 | 2248 | if verbose: | ||
3170 | 2249 | print "Removing non-monophyletic taxa via mini-supertree method" | ||
3171 | 2250 | tree_list = supertree_toolkit._find_trees_for_permuting(phyml) | ||
3172 | 2251 | try: | ||
3173 | 2252 | for t in tree_list: | ||
3174 | 2253 | # permute | ||
3175 | 2254 | output_string = supertree_toolkit.permute_tree(tree_list[t],matrix='hennig',treefile=None,verbose=verbose) | ||
3176 | 2255 | #save | ||
3177 | 2256 | if (not output_string == ""): | ||
3178 | 2257 | file_name = os.path.basename(filename) | ||
3179 | 2258 | dirname = os.path.dirname(filename) | ||
3180 | 2259 | new_output = os.path.join(dirname,t,t+"_matrix.tnt") | ||
3181 | 2260 | try: | ||
3182 | 2261 | os.makedirs(os.path.join(dirname,t)) | ||
3183 | 2262 | except OSError: | ||
3184 | 2263 | if not os.path.isdir(os.path.join(dirname,t)): | ||
3185 | 2264 | raise | ||
3186 | 2265 | f = open(new_output,'w',0) | ||
3187 | 2266 | f.write(output_string) | ||
3188 | 2267 | f.close | ||
3189 | 2268 | time.sleep(1) | ||
3190 | 2269 | |||
3191 | 2270 | # now create the tnt command to deal with this | ||
3192 | 2271 | # create a tmp file for the output tree | ||
3193 | 2272 | temp_file_handle, temp_file = tempfile.mkstemp(suffix=".tnt") | ||
3194 | 2273 | tnt_command = "tnt mxram 512,run "+new_output+",echo= ,timeout 00:10:00,rseed0,rseed*,hold 1000,xmult= level 0,taxname=,nelsen *,tsave *"+temp_file+",save /,quit" | ||
3195 | 2274 | #tnt_command = "tnt run "+new_output+",ienum,taxname=,nelsen*,tsave *"+temp_file+",save /,quit" | ||
3196 | 2275 | # run tnt, grab the output and store back in the data | ||
3197 | 2276 | #try: | ||
3198 | 2277 | call(tnt_command, shell=True) | ||
3199 | 2278 | #except CalledProcessError as e: | ||
3200 | 2279 | # msg = "***Error: Failed to run TNT. Is it installed correctl?.\n"+e.msg | ||
3201 | 2280 | # print msg | ||
3202 | 2281 | # return | ||
3203 | 2282 | #ret = os.system(tnt_command) | ||
3204 | 2283 | #if (not ret == 0): | ||
3205 | 2284 | # print "error running tnt" | ||
3206 | 2285 | # return | ||
3207 | 2286 | |||
3208 | 2287 | new_tree = supertree_toolkit.import_tree(temp_file) | ||
3209 | 2288 | phyml = supertree_toolkit._swap_tree_in_XML(phyml,new_tree,t) | ||
3210 | 2289 | |||
3211 | 2290 | except TreeParseError as e: | ||
3212 | 2291 | msg = "***Error permuting trees.\n"+e.msg | ||
3213 | 2292 | print msg | ||
3214 | 2293 | return | ||
3215 | 2294 | |||
3216 | 2295 | #4.5) remove MRP_Outgroups | ||
3217 | 2296 | phyml = supertree_toolkit.substitute_taxa(phyml,'MRP_Outgroup') | ||
3218 | 2297 | phyml = supertree_toolkit.substitute_taxa(phyml,'MRPOutgroup') | ||
3219 | 2298 | phyml = supertree_toolkit.substitute_taxa(phyml,'MRP_outgroup') | ||
3220 | 2299 | phyml = supertree_toolkit.substitute_taxa(phyml,'MRPoutgroup') | ||
3221 | 2300 | phyml = supertree_toolkit.substitute_taxa(phyml,'MRPOUTGROUP') | ||
3222 | 2301 | |||
3223 | 2302 | # save intermediate phyml | ||
3224 | 2303 | f = open(os.path.join(dirname,project_name+"_nonmonophyl_removed.phyml"), "w") | ||
3225 | 2304 | f.write(phyml) | ||
3226 | 2305 | f.close() | ||
3227 | 2306 | |||
3228 | 2307 | |||
3229 | 2308 | # 5) Remove common names | ||
3230 | 2309 | # no function to do this yet... | ||
3231 | 2310 | |||
3232 | 2311 | # 6) Data independance | ||
3233 | 2312 | if verbose: | ||
3234 | 2313 | print "Checking data independence" | ||
3235 | 2314 | data_ind,subsets,phyml = supertree_toolkit.data_independence(phyml,make_new_xml=True) | ||
3236 | 2315 | # save phyml | ||
3237 | 2316 | f = open(os.path.join(dirname,project_name+"_data_ind.phyml"), "w") | ||
3238 | 2317 | f.write(phyml) | ||
3239 | 2318 | f.close() | ||
3240 | 2319 | |||
3241 | 2320 | # 7) Data overlap | ||
3242 | 2321 | if verbose: | ||
3243 | 2322 | print "Checking data overlap" | ||
3244 | 2323 | sufficient_overlap, key_list = supertree_toolkit.data_overlap(phyml,verbose=verbose) | ||
3245 | 2324 | # process the key_list to remove the unconnected trees | ||
3246 | 2325 | if not sufficient_overlap: | ||
3247 | 2326 | # we don't, have enough, then remove all but the largest group. | ||
3248 | 2327 | # the key contains a list, with the largest group first (thanks networkX!) | ||
3249 | 2328 | # we can therefore just remove trees from everything but the first in the list | ||
3250 | 2329 | delete_me = [] | ||
3251 | 2330 | for t in key_list[1::]: # skip 0 | ||
3252 | 2331 | delete_me.extend(t) | ||
3253 | 2332 | for tree in delete_me: | ||
3254 | 2333 | phyml = supertree_toolkit._swap_tree_in_XML(phyml, None, tree, delete=True) # delete the tree and clean the data as we go | ||
3255 | 2334 | # save phyml | ||
3256 | 2335 | f = open(os.path.join(dirname,project_name+"_data_tax_overlap.phyml"), "w") | ||
3257 | 2336 | f.write(phyml) | ||
3258 | 2337 | f.close() | ||
3259 | 2338 | |||
3260 | 2339 | |||
3261 | 2340 | # 8) Create matrix | ||
3262 | 2341 | if verbose: | ||
3263 | 2342 | print "Creating matrix" | ||
3264 | 2343 | try: | ||
3265 | 2344 | matrix = supertree_toolkit.create_matrix(phyml) | ||
3266 | 2345 | except NotUniqueError as detail: | ||
3267 | 2346 | msg = "***Error: Failed to create matrix.\n"+detail.msg | ||
3268 | 2347 | print msg | ||
3269 | 2348 | return | ||
3270 | 2349 | except InvalidSTKData as detail: | ||
3271 | 2350 | msg = "***Error: Failed to create matrix.\n"+detail.msg | ||
3272 | 2351 | print msg | ||
3273 | 2352 | return | ||
3274 | 2353 | except UninformativeTreeError as detail: | ||
3275 | 2354 | msg = "***Error: Failed to create matrix.\n"+detail.msg | ||
3276 | 2355 | print msg | ||
3277 | 2356 | return | ||
3278 | 2357 | except TreeParseError as detail: | ||
3279 | 2358 | msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg | ||
3280 | 2359 | print msg | ||
3281 | 2360 | return | ||
3282 | 2361 | except: | ||
3283 | 2362 | msg = "***Error: Failed to create matrix due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" | ||
3284 | 2363 | print msg | ||
3285 | 2364 | traceback.print_exc() | ||
3286 | 2365 | return | ||
3287 | 2366 | |||
3288 | 2367 | f = open(output, "w") | ||
3289 | 2368 | f.write(matrix) | ||
3290 | 2369 | f.close() | ||
3291 | 2370 | |||
3292 | 2371 | return | ||
3293 | 2372 | |||
3294 | 2373 | |||
3295 | 2374 | def _equivalents_to_csv(equivalents): | ||
3296 | 2375 | |||
3297 | 2376 | output_string = 'Taxa,Equivalents,Status\n' | ||
3298 | 2377 | |||
3299 | 2378 | for taxon in sorted(equivalents): | ||
3300 | 2379 | output_string += taxon + "," + ';'.join(equivalents[taxon][0]) + "," + equivalents[taxon][1] + "\n" | ||
3301 | 2380 | |||
3302 | 2381 | return output_string | ||
3303 | 2382 | |||
3304 | 2383 | |||
3305 | 2384 | def _equivalents_to_subs(equivalents): | ||
3306 | 2385 | """Only corrects the yellow ones. Red and green are left alone""" | ||
3307 | 2386 | |||
3308 | 2387 | output_string = "" | ||
3309 | 2388 | for taxon in sorted(equivalents): | ||
3310 | 2389 | if (equivalents[taxon][1] == 'yellow'): | ||
3311 | 2390 | # the first name is always the correct one | ||
3312 | 2391 | output_string += taxon + " = "+equivalents[taxon][0][0]+"\n" | ||
3313 | 2392 | return output_string | ||
3314 | 1640 | 2393 | ||
3315 | 1641 | if __name__ == "__main__": | 2394 | if __name__ == "__main__": |
3316 | 1642 | main() | 2395 | main() |
3317 | 1643 | 2396 | ||
3318 | === modified file 'stk/stk_exceptions.py' | |||
3319 | --- stk/stk_exceptions.py 2013-10-22 08:26:54 +0000 | |||
3320 | +++ stk/stk_exceptions.py 2017-01-12 09:27:31 +0000 | |||
3321 | @@ -134,4 +134,12 @@ | |||
3322 | 134 | def __init__(self, msg): | 134 | def __init__(self, msg): |
3323 | 135 | self.msg = msg | 135 | self.msg = msg |
3324 | 136 | 136 | ||
3325 | 137 | class NoneCompleteTaxonomy(Error): | ||
3326 | 138 | """Exception raised when a taxonomy is not complete for these data | ||
3327 | 139 | Attributes: | ||
3328 | 140 | msg -- explaination of error | ||
3329 | 141 | """ | ||
3330 | 142 | |||
3331 | 143 | def __init__(self, msg): | ||
3332 | 144 | self.msg = msg | ||
3333 | 137 | 145 | ||
3334 | 138 | 146 | ||
3335 | === modified file 'stk/supertree_toolkit.py' | |||
3336 | --- stk/supertree_toolkit.py 2017-01-11 15:16:21 +0000 | |||
3337 | +++ stk/supertree_toolkit.py 2017-01-12 09:27:31 +0000 | |||
3338 | @@ -44,15 +44,49 @@ | |||
3339 | 44 | import unicodedata | 44 | import unicodedata |
3340 | 45 | from stk_internals import * | 45 | from stk_internals import * |
3341 | 46 | from copy import deepcopy | 46 | from copy import deepcopy |
3342 | 47 | import Queue | ||
3343 | 48 | import threading | ||
3344 | 49 | import urllib2 | ||
3345 | 50 | from urllib import quote_plus | ||
3346 | 51 | import simplejson as json | ||
3347 | 52 | import time | ||
3348 | 47 | import types | 53 | import types |
3349 | 48 | 54 | ||
3350 | 49 | #plt.ion() | 55 | #plt.ion() |
3351 | 50 | 56 | ||
3352 | 57 | sys.setrecursionlimit(50000) | ||
3353 | 51 | # GLOBAL VARIABLES | 58 | # GLOBAL VARIABLES |
3354 | 52 | IDENTICAL = 0 | 59 | IDENTICAL = 0 |
3355 | 53 | SUBSET = 1 | 60 | SUBSET = 1 |
3356 | 54 | PLATFORM = sys.platform | 61 | PLATFORM = sys.platform |
3358 | 55 | taxonomy_levels = ['species','genus','family','superfamily','infraorder','suborder','order','superorder','subclass','class','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] | 62 | #Logging |
3359 | 63 | import logging | ||
3360 | 64 | logging.basicConfig(filename='supertreetoolkit.log', level=logging.DEBUG, format='%(asctime)s %(levelname)s:%(message)s', datefmt='%m/%d/%Y %I:%M:%S %p') | ||
3361 | 65 | |||
3362 | 66 | # taxonomy levels | ||
3363 | 67 | # What we get from EOL | ||
3364 | 68 | current_taxonomy_levels = ['species','genus','family','order','class','phylum','kingdom'] | ||
3365 | 69 | # And the extra ones from ITIS | ||
3366 | 70 | extra_taxonomy_levels = ['superfamily','infraorder','suborder','superorder','subclass','subphylum','superphylum','infrakingdom','subkingdom'] | ||
3367 | 71 | # all of them in order | ||
3368 | 72 | taxonomy_levels = ['species','subgenus','genus','tribe','subfamily','family','superfamily','subsection','section','parvorder','infraorder','suborder','order','superorder','subclass','class','superclass','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] | ||
3369 | 73 | |||
3370 | 74 | SPECIES = taxonomy_levels[0] | ||
3371 | 75 | GENUS = taxonomy_levels[1] | ||
3372 | 76 | FAMILY = taxonomy_levels[2] | ||
3373 | 77 | SUPERFAMILY = taxonomy_levels[3] | ||
3374 | 78 | INFRAORDER = taxonomy_levels[4] | ||
3375 | 79 | SUBORDER = taxonomy_levels[5] | ||
3376 | 80 | ORDER = taxonomy_levels[6] | ||
3377 | 81 | SUPERORDER = taxonomy_levels[7] | ||
3378 | 82 | SUBCLASS = taxonomy_levels[8] | ||
3379 | 83 | CLASS = taxonomy_levels[9] | ||
3380 | 84 | SUBPHYLUM = taxonomy_levels[10] | ||
3381 | 85 | PHYLUM = taxonomy_levels[11] | ||
3382 | 86 | SUPERPHYLUM = taxonomy_levels[12] | ||
3383 | 87 | INFRAKINGDOM = taxonomy_levels[13] | ||
3384 | 88 | SUBKINGDOM = taxonomy_levels[14] | ||
3385 | 89 | KINGDOM = taxonomy_levels[15] | ||
3386 | 56 | 90 | ||
3387 | 57 | # supertree_toolkit is the backend for the STK. Loaded by both the GUI and | 91 | # supertree_toolkit is the backend for the STK. Loaded by both the GUI and |
3388 | 58 | # CLI, this contains all the functions to actually *do* something | 92 | # CLI, this contains all the functions to actually *do* something |
3389 | @@ -60,6 +94,17 @@ | |||
3390 | 60 | # All functions take XML and a list of other arguments, process the data and return | 94 | # All functions take XML and a list of other arguments, process the data and return |
3391 | 61 | # it back to the user interface handler to save it somewhere | 95 | # it back to the user interface handler to save it somewhere |
3392 | 62 | 96 | ||
3393 | 97 | |||
3394 | 98 | def get_project_name(XML): | ||
3395 | 99 | """ | ||
3396 | 100 | Get the name of the dataset currently being worked on | ||
3397 | 101 | """ | ||
3398 | 102 | |||
3399 | 103 | xml_root = _parse_xml(XML) | ||
3400 | 104 | |||
3401 | 105 | return xml_root.xpath('/phylo_storage/project_name/string_value')[0].text | ||
3402 | 106 | |||
3403 | 107 | |||
3404 | 63 | def create_name(authors, year, append=''): | 108 | def create_name(authors, year, append=''): |
3405 | 64 | """ | 109 | """ |
3406 | 65 | Construct a sensible from a list of authors and a year for a | 110 | Construct a sensible from a list of authors and a year for a |
3407 | @@ -161,6 +206,22 @@ | |||
3408 | 161 | 206 | ||
3409 | 162 | return names | 207 | return names |
3410 | 163 | 208 | ||
3411 | 209 | def get_all_tree_names(XML): | ||
3412 | 210 | """ From a full XML-PHYML string, extract all tree names. | ||
3413 | 211 | """ | ||
3414 | 212 | |||
3415 | 213 | xml_root = _parse_xml(XML) | ||
3416 | 214 | find = etree.XPath("//source") | ||
3417 | 215 | sources = find(xml_root) | ||
3418 | 216 | names = [] | ||
3419 | 217 | for s in sources: | ||
3420 | 218 | for st in s.xpath("source_tree"): | ||
3421 | 219 | if 'name' in st.attrib and not st.attrib['name'] == "": | ||
3422 | 220 | names.append(st.attrib['name']) | ||
3423 | 221 | |||
3424 | 222 | return names | ||
3425 | 223 | |||
3426 | 224 | |||
3427 | 164 | def set_unique_names(XML): | 225 | def set_unique_names(XML): |
3428 | 165 | """ Ensures all sources have unique names. | 226 | """ Ensures all sources have unique names. |
3429 | 166 | """ | 227 | """ |
3430 | @@ -249,9 +310,17 @@ | |||
3431 | 249 | if (ele.tag == "source"): | 310 | if (ele.tag == "source"): |
3432 | 250 | sources.append(ele) | 311 | sources.append(ele) |
3433 | 251 | 312 | ||
3434 | 313 | if overwrite: | ||
3435 | 314 | # remove all the names first | ||
3436 | 315 | for s in sources: | ||
3437 | 316 | for st in s.xpath("source_tree"): | ||
3438 | 317 | if 'name' in st.attrib: | ||
3439 | 318 | del st.attrib['name'] | ||
3440 | 319 | |||
3441 | 320 | |||
3442 | 252 | for s in sources: | 321 | for s in sources: |
3443 | 253 | for st in s.xpath("source_tree"): | 322 | for st in s.xpath("source_tree"): |
3445 | 254 | if overwrite or not 'name' in st.attrib: | 323 | if not'name' in st.attrib: |
3446 | 255 | tree_name = create_tree_name(XML,st) | 324 | tree_name = create_tree_name(XML,st) |
3447 | 256 | st.attrib['name'] = tree_name | 325 | st.attrib['name'] = tree_name |
3448 | 257 | 326 | ||
3449 | @@ -339,7 +408,7 @@ | |||
3450 | 339 | taxa = etree.SubElement(s_tree,"taxa_data") | 408 | taxa = etree.SubElement(s_tree,"taxa_data") |
3451 | 340 | taxa.tail="\n " | 409 | taxa.tail="\n " |
3452 | 341 | # Note: we do not add all elements as otherwise they get set to some option | 410 | # Note: we do not add all elements as otherwise they get set to some option |
3454 | 342 | # rather than remaining blank (and hence blue int he interface) | 411 | # rather than remaining blank (and hence blue in the interface) |
3455 | 343 | 412 | ||
3456 | 344 | # append our new source to the main tree | 413 | # append our new source to the main tree |
3457 | 345 | # if sources has no valid source, overwrite, | 414 | # if sources has no valid source, overwrite, |
3458 | @@ -877,7 +946,7 @@ | |||
3459 | 877 | # Need to add checks on the file. Problems include: | 946 | # Need to add checks on the file. Problems include: |
3460 | 878 | # TNT: outputs Phyllip format or something - basically a Newick | 947 | # TNT: outputs Phyllip format or something - basically a Newick |
3461 | 879 | # string without commas, so add 'em back in | 948 | # string without commas, so add 'em back in |
3463 | 880 | m = re.search(r'proc-;', content) | 949 | m = re.search(r'proc.;', content) |
3464 | 881 | if (m != None): | 950 | if (m != None): |
3465 | 882 | # TNT output tree | 951 | # TNT output tree |
3466 | 883 | # Done on a Mac? Replace ^M with a newline | 952 | # Done on a Mac? Replace ^M with a newline |
3467 | @@ -1402,6 +1471,36 @@ | |||
3468 | 1402 | 1471 | ||
3469 | 1403 | return _amalgamate_trees(trees,format,anonymous) | 1472 | return _amalgamate_trees(trees,format,anonymous) |
3470 | 1404 | 1473 | ||
3471 | 1474 | def get_taxa_from_tree_for_taxonomy(tree, pretty=False, ignoreErrors=False): | ||
3472 | 1475 | """Returns a list of all taxa available for the tree passed as argument. | ||
3473 | 1476 | :param tree: string with the data for the tree in Newick format. | ||
3474 | 1477 | :type tree: string | ||
3475 | 1478 | :param pretty: defines if '_' in taxa names should be replaced with spaces. | ||
3476 | 1479 | :type pretty: boolean | ||
3477 | 1480 | :param ignoreErrors: should execution continue on error? | ||
3478 | 1481 | :type ignoreErrors: boolean | ||
3479 | 1482 | :returns: list of strings with the taxa names, sorted alphabetically | ||
3480 | 1483 | :rtype: list | ||
3481 | 1484 | """ | ||
3482 | 1485 | taxa_list = [] | ||
3483 | 1486 | |||
3484 | 1487 | try: | ||
3485 | 1488 | taxa_list.extend(_getTaxaFromNewick(tree)) | ||
3486 | 1489 | except TreeParseError as detail: | ||
3487 | 1490 | if (ignoreErrors): | ||
3488 | 1491 | logging.warning(detail.msg) | ||
3489 | 1492 | pass | ||
3490 | 1493 | else: | ||
3491 | 1494 | raise TreeParseError( detail.msg ) | ||
3492 | 1495 | |||
3493 | 1496 | # now uniquify the list of taxa | ||
3494 | 1497 | taxa_list = _uniquify(taxa_list) | ||
3495 | 1498 | taxa_list.sort() | ||
3496 | 1499 | |||
3497 | 1500 | if (pretty): | ||
3498 | 1501 | taxa_list = [x.replace('_', ' ') for x in taxa_list] | ||
3499 | 1502 | |||
3500 | 1503 | return taxa_list | ||
3501 | 1405 | 1504 | ||
3502 | 1406 | def get_all_taxa(XML, pretty=False, ignoreErrors=False): | 1505 | def get_all_taxa(XML, pretty=False, ignoreErrors=False): |
3503 | 1407 | """ Produce a taxa list by scanning all trees within | 1506 | """ Produce a taxa list by scanning all trees within |
3504 | @@ -1422,21 +1521,17 @@ | |||
3505 | 1422 | taxa_list.extend(_getTaxaFromNewick(t)) | 1521 | taxa_list.extend(_getTaxaFromNewick(t)) |
3506 | 1423 | except TreeParseError as detail: | 1522 | except TreeParseError as detail: |
3507 | 1424 | if (ignoreErrors): | 1523 | if (ignoreErrors): |
3508 | 1524 | logging.warning(detail.msg) | ||
3509 | 1425 | pass | 1525 | pass |
3510 | 1426 | else: | 1526 | else: |
3511 | 1427 | raise TreeParseError( detail.msg ) | 1527 | raise TreeParseError( detail.msg ) |
3512 | 1428 | 1528 | ||
3513 | 1429 | |||
3514 | 1430 | |||
3515 | 1431 | # now uniquify the list of taxa | 1529 | # now uniquify the list of taxa |
3516 | 1432 | taxa_list = _uniquify(taxa_list) | 1530 | taxa_list = _uniquify(taxa_list) |
3517 | 1433 | taxa_list.sort() | 1531 | taxa_list.sort() |
3518 | 1434 | 1532 | ||
3524 | 1435 | if (pretty): | 1533 | if (pretty): #Remove underscores from names |
3525 | 1436 | unpretty_tl = taxa_list | 1534 | taxa_list = [x.replace('_', ' ') for x in taxa_list] |
3521 | 1437 | taxa_list = [] | ||
3522 | 1438 | for t in unpretty_tl: | ||
3523 | 1439 | taxa_list.append(t.replace('_',' ')) | ||
3526 | 1440 | 1535 | ||
3527 | 1441 | return taxa_list | 1536 | return taxa_list |
3528 | 1442 | 1537 | ||
3529 | @@ -1508,7 +1603,7 @@ | |||
3530 | 1508 | return outgroups | 1603 | return outgroups |
3531 | 1509 | 1604 | ||
3532 | 1510 | 1605 | ||
3534 | 1511 | def create_matrix(XML,format="hennig",quote=False,taxonomy=None,outgroups=False,ignoreWarnings=False): | 1606 | def create_matrix(XML,format="hennig",quote=False,taxonomy=None,outgroups=False,ignoreWarnings=False, verbose=False): |
3535 | 1512 | """ From all trees in the XML, create a matrix | 1607 | """ From all trees in the XML, create a matrix |
3536 | 1513 | """ | 1608 | """ |
3537 | 1514 | 1609 | ||
3538 | @@ -1553,7 +1648,7 @@ | |||
3539 | 1553 | taxa.sort() | 1648 | taxa.sort() |
3540 | 1554 | taxa.insert(0,"MRP_Outgroup") | 1649 | taxa.insert(0,"MRP_Outgroup") |
3541 | 1555 | 1650 | ||
3543 | 1556 | return _create_matrix(trees, taxa, format=format, quote=quote, weights=weights) | 1651 | return _create_matrix(trees, taxa, format=format, quote=quote, weights=weights,verbose=verbose) |
3544 | 1557 | 1652 | ||
3545 | 1558 | 1653 | ||
3546 | 1559 | def create_matrix_from_trees(trees,format="hennig"): | 1654 | def create_matrix_from_trees(trees,format="hennig"): |
3547 | @@ -1925,7 +2020,7 @@ | |||
3548 | 1925 | _check_data(XML) | 2020 | _check_data(XML) |
3549 | 1926 | 2021 | ||
3550 | 1927 | xml_root = _parse_xml(XML) | 2022 | xml_root = _parse_xml(XML) |
3552 | 1928 | proj_name = xml_root.xpath('/phylo_storage/project_name/string_value')[0].text | 2023 | proj_name = get_project_name(XML) |
3553 | 1929 | 2024 | ||
3554 | 1930 | output_string = "======================\n" | 2025 | output_string = "======================\n" |
3555 | 1931 | output_string += " Data summary of: " + proj_name + "\n" | 2026 | output_string += " Data summary of: " + proj_name + "\n" |
3556 | @@ -1989,6 +2084,188 @@ | |||
3557 | 1989 | 2084 | ||
3558 | 1990 | return output_string | 2085 | return output_string |
3559 | 1991 | 2086 | ||
3560 | 2087 | def taxonomic_checker_list(name_list,existing_data=None,verbose=False): | ||
3561 | 2088 | """ For each name in the database generate a database of the original name, | ||
3562 | 2089 | possible synonyms and if the taxon is not know, signal that. We do this by | ||
3563 | 2090 | using the EoL API to grab synonyms of each taxon. """ | ||
3564 | 2091 | |||
3565 | 2092 | import urllib2 | ||
3566 | 2093 | from urllib import quote_plus | ||
3567 | 2094 | import simplejson as json | ||
3568 | 2095 | |||
3569 | 2096 | if existing_data == None: | ||
3570 | 2097 | equivalents = {} | ||
3571 | 2098 | else: | ||
3572 | 2099 | equivalents = existing_data | ||
3573 | 2100 | |||
3574 | 2101 | # for each taxon, check the name on EoL - what if it's a synonym? Does EoL still return a result? | ||
3575 | 2102 | # if not, is there another API function to do this? | ||
3576 | 2103 | # search for the taxon and grab the name - if you search for a recognised synonym on EoL then | ||
3577 | 2104 | # you get the original ('correct') name - shorten this to two words and you're done. | ||
3578 | 2105 | for t in name_list: | ||
3579 | 2106 | if t in equivalents: | ||
3580 | 2107 | continue | ||
3581 | 2108 | taxon = t.replace("_"," ") | ||
3582 | 2109 | if (verbose): | ||
3583 | 2110 | print "Looking up ", taxon | ||
3584 | 2111 | # get the data from EOL on taxon | ||
3585 | 2112 | taxonq = quote_plus(taxon) | ||
3586 | 2113 | URL = "http://eol.org/api/search/1.0.json?q="+taxonq | ||
3587 | 2114 | req = urllib2.Request(URL) | ||
3588 | 2115 | opener = urllib2.build_opener() | ||
3589 | 2116 | f = opener.open(req) | ||
3590 | 2117 | data = json.load(f) | ||
3591 | 2118 | # check if there's some data | ||
3592 | 2119 | if len(data['results']) == 0: | ||
3593 | 2120 | equivalents[t] = [[t],'red'] | ||
3594 | 2121 | continue | ||
3595 | 2122 | amber = False | ||
3596 | 2123 | if len(data['results']) > 1: | ||
3597 | 2124 | # this is not great - we have multiple hits for this taxon - needs the user to go back and warn about this | ||
3598 | 2125 | # for automatic processing we'll just take the first one though | ||
3599 | 2126 | # colour is amber in this case | ||
3600 | 2127 | amber = True | ||
3601 | 2128 | ID = str(data['results'][0]['id']) # take first hit | ||
3602 | 2129 | URL = "http://eol.org/api/pages/1.0/"+ID+".json?images=0&videos=0&sounds=0&maps=0&text=0&iucn=false&subjects=overview&licenses=all&details=true&common_names=true&synonyms=true&references=true&vetted=0" | ||
3603 | 2130 | req = urllib2.Request(URL) | ||
3604 | 2131 | opener = urllib2.build_opener() | ||
3605 | 2132 | |||
3606 | 2133 | try: | ||
3607 | 2134 | f = opener.open(req) | ||
3608 | 2135 | except urllib2.HTTPError: | ||
3609 | 2136 | equivalents[t] = [[t],'red'] | ||
3610 | 2137 | continue | ||
3611 | 2138 | data = json.load(f) | ||
3612 | 2139 | if len(data['scientificName']) == 0: | ||
3613 | 2140 | # not found a scientific name, so set as red | ||
3614 | 2141 | equivalents[t] = [[t],'red'] | ||
3615 | 2142 | continue | ||
3616 | 2143 | correct_name = data['scientificName'].encode("ascii","ignore") | ||
3617 | 2144 | # we only want the first two bits of the name, not the original author and year if any | ||
3618 | 2145 | temp_name = correct_name.split(' ') | ||
3619 | 2146 | if (len(temp_name) > 2): | ||
3620 | 2147 | correct_name = ' '.join(temp_name[0:2]) | ||
3621 | 2148 | correct_name = correct_name.replace(' ','_') | ||
3622 | 2149 | |||
3623 | 2150 | # build up the output dictionary - original name is key, synonyms/missing is value | ||
3624 | 2151 | if (correct_name == t): | ||
3625 | 2152 | # if the original matches the 'correct', then it's green | ||
3626 | 2153 | equivalents[t] = [[t], 'green'] | ||
3627 | 2154 | else: | ||
3628 | 2155 | # if we managed to get something anyway, then it's yellow and create a list of possible synonyms with the | ||
3629 | 2156 | # 'correct' taxon at the top | ||
3630 | 2157 | eol_synonyms = data['synonyms'] | ||
3631 | 2158 | synonyms = [] | ||
3632 | 2159 | for s in eol_synonyms: | ||
3633 | 2160 | ts = s['synonym'].encode("ascii","ignore") | ||
3634 | 2161 | temp_syn = ts.split(' ') | ||
3635 | 2162 | if (len(temp_syn) > 2): | ||
3636 | 2163 | temp_syn = ' '.join(temp_syn[0:2]) | ||
3637 | 2164 | ts = temp_syn | ||
3638 | 2165 | if (s['relationship'] == "synonym"): | ||
3639 | 2166 | ts = ts.replace(" ","_") | ||
3640 | 2167 | synonyms.append(ts) | ||
3641 | 2168 | synonyms = _uniquify(synonyms) | ||
3642 | 2169 | # we need to put the correct name at the top of the list now | ||
3643 | 2170 | if (correct_name in synonyms): | ||
3644 | 2171 | synonyms.insert(0, synonyms.pop(synonyms.index(correct_name))) | ||
3645 | 2172 | elif len(synonyms) == 0: | ||
3646 | 2173 | synonyms.append(correct_name) | ||
3647 | 2174 | else: | ||
3648 | 2175 | synonyms.insert(0,correct_name) | ||
3649 | 2176 | |||
3650 | 2177 | if (amber): | ||
3651 | 2178 | equivalents[t] = [synonyms,'amber'] | ||
3652 | 2179 | else: | ||
3653 | 2180 | equivalents[t] = [synonyms,'yellow'] | ||
3654 | 2181 | # if our search was empty, then it's red - see above | ||
3655 | 2182 | |||
3656 | 2183 | # up to the calling funciton to do something sensible with this | ||
3657 | 2184 | # we build a dictionary of names and then a list of synonyms or the original name, then a tag if it's green, yellow, red. | ||
3658 | 2185 | # Amber means we found synonyms and multilpe hits. User def needs to sort these! | ||
3659 | 2186 | |||
3660 | 2187 | return equivalents | ||
3661 | 2188 | |||
3662 | 2189 | def taxonomic_checker_tree(tree_file,existing_data=None,verbose=False): | ||
3663 | 2190 | """ For each name in the database generate a database of the original name, | ||
3664 | 2191 | possible synonyms and if the taxon is not know, signal that. We do this by | ||
3665 | 2192 | using the EoL API to grab synonyms of each taxon. """ | ||
3666 | 2193 | |||
3667 | 2194 | tree = import_tree(tree_file) | ||
3668 | 2195 | p4tree = _parse_tree(tree) | ||
3669 | 2196 | taxa = p4tree.getAllLeafNames(p4tree.root) | ||
3670 | 2197 | if existing_data == None: | ||
3671 | 2198 | equivalents = {} | ||
3672 | 2199 | else: | ||
3673 | 2200 | equivalents = existing_data | ||
3674 | 2201 | |||
3675 | 2202 | equivalents = taxonomic_checker_list(taxa,existing_data,verbose) | ||
3676 | 2203 | return equivalents | ||
3677 | 2204 | |||
3678 | 2205 | def taxonomic_checker(XML,existing_data=None,verbose=False): | ||
3679 | 2206 | """ For each name in the database generate a database of the original name, | ||
3680 | 2207 | possible synonyms and if the taxon is not know, signal that. We do this by | ||
3681 | 2208 | using the EoL API to grab synonyms of each taxon. """ | ||
3682 | 2209 | |||
3683 | 2210 | # grab all taxa | ||
3684 | 2211 | taxa = get_all_taxa(XML) | ||
3685 | 2212 | |||
3686 | 2213 | if existing_data == None: | ||
3687 | 2214 | equivalents = {} | ||
3688 | 2215 | else: | ||
3689 | 2216 | equivalents = existing_data | ||
3690 | 2217 | |||
3691 | 2218 | equivalents = taxonomic_checker_list(taxa,existing_data,verbose) | ||
3692 | 2219 | return equivalents | ||
3693 | 2220 | |||
3694 | 2221 | |||
3695 | 2222 | def load_equivalents(equiv_csv): | ||
3696 | 2223 | """Load equivalents data from a csv and convert to a equivalents Dict. | ||
3697 | 2224 | Structure is key, with a list that is array of synonyms, followed by status ('green', | ||
3698 | 2225 | 'yellow' or 'red'). | ||
3699 | 2226 | |||
3700 | 2227 | """ | ||
3701 | 2228 | |||
3702 | 2229 | import csv | ||
3703 | 2230 | |||
3704 | 2231 | equivalents = {} | ||
3705 | 2232 | |||
3706 | 2233 | with open(equiv_csv, 'rU') as csvfile: | ||
3707 | 2234 | equiv_reader = csv.reader(csvfile, delimiter=',') | ||
3708 | 2235 | equiv_reader.next() # skip header | ||
3709 | 2236 | for row in equiv_reader: | ||
3710 | 2237 | i = 1 | ||
3711 | 2238 | equivalents[row[0]] = [row[1].split(';'),row[2]] | ||
3712 | 2239 | |||
3713 | 2240 | return equivalents | ||
3714 | 2241 | |||
3715 | 2242 | def save_taxonomy(taxonomy, output_file): | ||
3716 | 2243 | |||
3717 | 2244 | import csv | ||
3718 | 2245 | |||
3719 | 2246 | with open(output_file, 'w') as f: | ||
3720 | 2247 | writer = csv.writer(f) | ||
3721 | 2248 | row = ['OTU'] | ||
3722 | 2249 | row.extend(taxonomy_levels) | ||
3723 | 2250 | row.append('Provider') | ||
3724 | 2251 | writer.writerow(row) | ||
3725 | 2252 | for t in taxonomy: | ||
3726 | 2253 | species = t | ||
3727 | 2254 | row = [] | ||
3728 | 2255 | row.append(t.encode('utf-8')) | ||
3729 | 2256 | for l in taxonomy_levels: | ||
3730 | 2257 | try: | ||
3731 | 2258 | g = taxonomy[t][l] | ||
3732 | 2259 | except KeyError: | ||
3733 | 2260 | g = '-' | ||
3734 | 2261 | row.append(g.encode('utf-8')) | ||
3735 | 2262 | try: | ||
3736 | 2263 | provider = taxonomy[t]['provider'] | ||
3737 | 2264 | except KeyError: | ||
3738 | 2265 | provider = "-" | ||
3739 | 2266 | row.append(provider) | ||
3740 | 2267 | |||
3741 | 2268 | writer.writerow(row) | ||
3742 | 1992 | 2269 | ||
3743 | 1993 | 2270 | ||
3744 | 1994 | def load_taxonomy(taxonomy_csv): | 2271 | def load_taxonomy(taxonomy_csv): |
3745 | @@ -2000,20 +2277,443 @@ | |||
3746 | 2000 | 2277 | ||
3747 | 2001 | with open(taxonomy_csv, 'rU') as csvfile: | 2278 | with open(taxonomy_csv, 'rU') as csvfile: |
3748 | 2002 | tax_reader = csv.reader(csvfile, delimiter=',') | 2279 | tax_reader = csv.reader(csvfile, delimiter=',') |
3763 | 2003 | tax_reader.next() | 2280 | try: |
3764 | 2004 | for row in tax_reader: | 2281 | j = 0 |
3765 | 2005 | current_taxonomy = {} | 2282 | for row in tax_reader: |
3766 | 2006 | i = 1 | 2283 | if j == 0: |
3767 | 2007 | for t in taxonomy_levels: | 2284 | tax_levels = row[1:-1] |
3768 | 2008 | if not row[i] == '-': | 2285 | j += 1 |
3769 | 2009 | current_taxonomy[t] = row[i] | 2286 | continue |
3770 | 2010 | i = i+ 1 | 2287 | i = 1 |
3771 | 2011 | 2288 | current_taxonomy = {} | |
3772 | 2012 | current_taxonomy['provider'] = row[17] # data source | 2289 | for t in tax_levels: |
3773 | 2013 | taxonomy[row[0]] = current_taxonomy | 2290 | if not row[i] == '-': |
3774 | 2014 | 2291 | current_taxonomy[t] = row[i] | |
3775 | 2015 | return taxonomy | 2292 | i = i+ 1 |
3776 | 2016 | 2293 | current_taxonomy['provider'] = row[-1] # data source | |
3777 | 2294 | taxonomy[row[0].replace(" ","_")] = current_taxonomy | ||
3778 | 2295 | j += 1 | ||
3779 | 2296 | except: | ||
3780 | 2297 | pass | ||
3781 | 2298 | |||
3782 | 2299 | return taxonomy | ||
3783 | 2300 | |||
3784 | 2301 | |||
3785 | 2302 | class TaxonomyFetcher(threading.Thread): | ||
3786 | 2303 | """ Class to provide the taxonomy fetching functionality as a threaded function to be used individually or working with a pool. | ||
3787 | 2304 | """ | ||
3788 | 2305 | |||
3789 | 2306 | def __init__(self, taxonomy, lock, queue, id=0, pref_db=None, verbose=False, ignoreWarnings=False): | ||
3790 | 2307 | """ Constructor for the threaded model. | ||
3791 | 2308 | :param taxonomy: previous taxonomy available (if available) or an empty dictionary to store the results . | ||
3792 | 2309 | :type taxonomy: dictionary | ||
3793 | 2310 | :param lock: lock to keep the taxonomy threadsafe. | ||
3794 | 2311 | :type lock: Lock | ||
3795 | 2312 | :param queue: queue where the taxa are kept to be processed. | ||
3796 | 2313 | :type queue: Queue of strings | ||
3797 | 2314 | :param id: id for the thread to use if messages need to be printed. | ||
3798 | 2315 | :type id: int | ||
3799 | 2316 | :param pref_db: Gives priority to database. Seems it is unused. | ||
3800 | 2317 | :type pref_db: string | ||
3801 | 2318 | :param verbose: Show verbose messages during execution, will also define level of logging. True will set logging level to INFO. | ||
3802 | 2319 | :type verbose: boolean | ||
3803 | 2320 | :param ignoreWarnings: Ignore warnings and errors during execution? Errors will be logged with ERROR level on the logging output. | ||
3804 | 2321 | :type ignoreWarnings: boolean | ||
3805 | 2322 | """ | ||
3806 | 2323 | |||
3807 | 2324 | threading.Thread.__init__(self) | ||
3808 | 2325 | self.taxonomy = taxonomy | ||
3809 | 2326 | self.lock = lock | ||
3810 | 2327 | self.queue = queue | ||
3811 | 2328 | self.id = id | ||
3812 | 2329 | self.verbose = verbose | ||
3813 | 2330 | self.pref_db = pref_db | ||
3814 | 2331 | self.ignoreWarnings = ignoreWarnings | ||
3815 | 2332 | |||
3816 | 2333 | def run(self): | ||
3817 | 2334 | """ Gets and processes a taxon from the queue to get its taxonomy.""" | ||
3818 | 2335 | while True : | ||
3819 | 2336 | if self.verbose : | ||
3820 | 2337 | logging.getLogger().setLevel(logging.INFO) | ||
3821 | 2338 | #get taxon from queue | ||
3822 | 2339 | taxon = self.queue.get() | ||
3823 | 2340 | |||
3824 | 2341 | logging.debug("Starting {} with thread #{} remaining ~{}".format(taxon,str(self.id),str(self.queue.qsize()))) | ||
3825 | 2342 | |||
3826 | 2343 | #Lock access to the taxonomy | ||
3827 | 2344 | self.lock.acquire() | ||
3828 | 2345 | if not taxon in self.taxonomy: # is a new taxon, not previously in the taxonomy | ||
3829 | 2346 | #Release access to the taxonomy | ||
3830 | 2347 | self.lock.release() | ||
3831 | 2348 | if (self.verbose): | ||
3832 | 2349 | print "Looking up ", taxon | ||
3833 | 2350 | logging.info("Loolking up taxon: {}".format(str(taxon))) | ||
3834 | 2351 | try: | ||
3835 | 2352 | # get the data from EOL on taxon | ||
3836 | 2353 | taxonq = quote_plus(taxon) | ||
3837 | 2354 | URL = "http://eol.org/api/search/1.0.json?q="+taxonq | ||
3838 | 2355 | req = urllib2.Request(URL) | ||
3839 | 2356 | opener = urllib2.build_opener() | ||
3840 | 2357 | f = opener.open(req) | ||
3841 | 2358 | data = json.load(f) | ||
3842 | 2359 | # check if there's some data | ||
3843 | 2360 | if len(data['results']) == 0: | ||
3844 | 2361 | # try PBDB as it might be a fossil | ||
3845 | 2362 | URL = "http://paleobiodb.org/data1.1/taxa/single.json?name="+taxonq+"&show=phylo&vocab=pbdb" | ||
3846 | 2363 | req = urllib2.Request(URL) | ||
3847 | 2364 | opener = urllib2.build_opener() | ||
3848 | 2365 | f = opener.open(req) | ||
3849 | 2366 | datapbdb = json.load(f) | ||
3850 | 2367 | if (len(datapbdb['records']) == 0): | ||
3851 | 2368 | # no idea! | ||
3852 | 2369 | with self.lock: | ||
3853 | 2370 | self.taxonomy[taxon] = {} | ||
3854 | 2371 | self.queue.task_done() | ||
3855 | 2372 | continue | ||
3856 | 2373 | # otherwise, let's fill in info here - only if extinct! | ||
3857 | 2374 | if datapbdb['records'][0]['is_extant'] == 0: | ||
3858 | 2375 | this_taxonomy = {} | ||
3859 | 2376 | this_taxonomy['provider'] = 'PBDB' | ||
3860 | 2377 | for level in taxonomy_levels: | ||
3861 | 2378 | try: | ||
3862 | 2379 | if datapbdb.has_key('records'): | ||
3863 | 2380 | pbdb_lev = datapbdb['records'][0][level] | ||
3864 | 2381 | temp_lev = pbdb_lev.split(" ") | ||
3865 | 2382 | # they might have the author on the end, so strip it off | ||
3866 | 2383 | if (level == 'species'): | ||
3867 | 2384 | this_taxonomy[level] = ' '.join(temp_lev[0:2]) | ||
3868 | 2385 | else: | ||
3869 | 2386 | this_taxonomy[level] = temp_lev[0] | ||
3870 | 2387 | except KeyError as e: | ||
3871 | 2388 | logging.exception("Key not found records") | ||
3872 | 2389 | continue | ||
3873 | 2390 | # add the taxon at right level too | ||
3874 | 2391 | try: | ||
3875 | 2392 | if datapbdb.has_key('records'): | ||
3876 | 2393 | current_level = datapbdb['records'][0]['rank'] | ||
3877 | 2394 | this_taxonomy[current_level] = datapbdb['records'][0]['taxon_name'] | ||
3878 | 2395 | except KeyError as e: | ||
3879 | 2396 | self.queue.task_done() | ||
3880 | 2397 | logging.exception("Key not found records") | ||
3881 | 2398 | continue | ||
3882 | 2399 | with self.lock: | ||
3883 | 2400 | self.taxonomy[taxon] = this_taxonomy | ||
3884 | 2401 | self.queue.task_done() | ||
3885 | 2402 | continue | ||
3886 | 2403 | else: | ||
3887 | 2404 | # extant, but not in EoL - leave the user to sort this one out | ||
3888 | 2405 | with self.lock: | ||
3889 | 2406 | self.taxonomy[taxon] = {} | ||
3890 | 2407 | self.queue.task_done() | ||
3891 | 2408 | continue | ||
3892 | 2409 | |||
3893 | 2410 | |||
3894 | 2411 | ID = str(data['results'][0]['id']) # take first hit | ||
3895 | 2412 | # Now look for taxonomies | ||
3896 | 2413 | URL = "http://eol.org/api/pages/1.0/"+ID+".json" | ||
3897 | 2414 | req = urllib2.Request(URL) | ||
3898 | 2415 | opener = urllib2.build_opener() | ||
3899 | 2416 | f = opener.open(req) | ||
3900 | 2417 | data = json.load(f) | ||
3901 | 2418 | if len(data['taxonConcepts']) == 0: | ||
3902 | 2419 | with self.lock: | ||
3903 | 2420 | self.taxonomy[taxon] = {} | ||
3904 | 2421 | self.queue.task_done() | ||
3905 | 2422 | continue | ||
3906 | 2423 | TID = str(data['taxonConcepts'][0]['identifier']) # take first hit | ||
3907 | 2424 | currentdb = str(data['taxonConcepts'][0]['nameAccordingTo']) | ||
3908 | 2425 | # loop through and get preferred one if specified | ||
3909 | 2426 | # now get taxonomy | ||
3910 | 2427 | if (not self.pref_db is None): | ||
3911 | 2428 | for db in data['taxonConcepts']: | ||
3912 | 2429 | currentdb = db['nameAccordingTo'].lower() | ||
3913 | 2430 | if (self.pref_db.lower() in currentdb): | ||
3914 | 2431 | TID = str(db['identifier']) | ||
3915 | 2432 | break | ||
3916 | 2433 | URL="http://eol.org/api/hierarchy_entries/1.0/"+TID+".json" | ||
3917 | 2434 | req = urllib2.Request(URL) | ||
3918 | 2435 | opener = urllib2.build_opener() | ||
3919 | 2436 | f = opener.open(req) | ||
3920 | 2437 | data = json.load(f) | ||
3921 | 2438 | this_taxonomy = {} | ||
3922 | 2439 | this_taxonomy['provider'] = currentdb | ||
3923 | 2440 | for a in data['ancestors']: | ||
3924 | 2441 | try: | ||
3925 | 2442 | if a.has_key('taxonRank') : | ||
3926 | 2443 | temp_level = a['taxonRank'].encode("ascii","ignore") | ||
3927 | 2444 | if (temp_level in taxonomy_levels): | ||
3928 | 2445 | # note the dump into ASCII | ||
3929 | 2446 | temp_name = a['scientificName'].encode("ascii","ignore") | ||
3930 | 2447 | temp_name = temp_name.split(" ") | ||
3931 | 2448 | if (temp_level == 'species'): | ||
3932 | 2449 | this_taxonomy[temp_level] = temp_name[0:2] | ||
3933 | 2450 | |||
3934 | 2451 | else: | ||
3935 | 2452 | this_taxonomy[temp_level] = temp_name[0] | ||
3936 | 2453 | except KeyError as e: | ||
3937 | 2454 | logging.exception("Key not found: taxonRank") | ||
3938 | 2455 | continue | ||
3939 | 2456 | try: | ||
3940 | 2457 | # add taxonomy in to the taxonomy! | ||
3941 | 2458 | # some issues here, so let's make sure it's OK | ||
3942 | 2459 | temp_name = taxon.split(" ") | ||
3943 | 2460 | if data.has_key('taxonRank') : | ||
3944 | 2461 | if not data['taxonRank'].lower() == 'species': | ||
3945 | 2462 | this_taxonomy[data['taxonRank'].lower()] = temp_name[0] | ||
3946 | 2463 | else: | ||
3947 | 2464 | this_taxonomy[data['taxonRank'].lower()] = ' '.join(temp_name[0:2]) | ||
3948 | 2465 | except KeyError as e: | ||
3949 | 2466 | self.queue.task_done() | ||
3950 | 2467 | logging.exception("Key not found: taxonRank") | ||
3951 | 2468 | continue | ||
3952 | 2469 | with self.lock: | ||
3953 | 2470 | #Send result to dictionary | ||
3954 | 2471 | self.taxonomy[taxon] = this_taxonomy | ||
3955 | 2472 | except urllib2.HTTPError: | ||
3956 | 2473 | print("Network error when processing {} ".format(taxon,)) | ||
3957 | 2474 | logging.info("Network error when processing {} ".format(taxon,)) | ||
3958 | 2475 | self.queue.task_done() | ||
3959 | 2476 | continue | ||
3960 | 2477 | except urllib2.URLError: | ||
3961 | 2478 | print("Network error when processing {} ".format(taxon,)) | ||
3962 | 2479 | logging.info("Network error when processing {} ".format(taxon,)) | ||
3963 | 2480 | self.queue.task_done() | ||
3964 | 2481 | continue | ||
3965 | 2482 | else : | ||
3966 | 2483 | #Nothing to do release the lock on taxonomy | ||
3967 | 2484 | self.lock.release() | ||
3968 | 2485 | #Mark task as done | ||
3969 | 2486 | self.queue.task_done() | ||
3970 | 2487 | |||
3971 | 2488 | def create_taxonomy_from_taxa(taxa, taxonomy=None, pref_db=None, verbose=False, ignoreWarnings=False, threadNumber=5): | ||
3972 | 2489 | """Uses the taxa provided to generate a taxonomy for all the taxon available. | ||
3973 | 2490 | :param taxa: list of the taxa. | ||
3974 | 2491 | :type taxa : list | ||
3975 | 2492 | :param taxonomy: previous taxonomy available (if available) or an empty | ||
3976 | 2493 | dictionary to store the results. If None will be init to an empty dictionary | ||
3977 | 2494 | :type taxonomy: dictionary | ||
3978 | 2495 | :param pref_db: Gives priority to database. Seems it is unused. | ||
3979 | 2496 | :type pref_db: string | ||
3980 | 2497 | :param verbose: Show verbose messages during execution, will also define | ||
3981 | 2498 | level of logging. True will set logging level to INFO. | ||
3982 | 2499 | :type verbose: boolean | ||
3983 | 2500 | :param ignoreWarnings: Ignore warnings and errors during execution? Errors | ||
3984 | 2501 | will be logged with ERROR level on the logging output. | ||
3985 | 2502 | :type ignoreWarnings: boolean | ||
3986 | 2503 | :param threadNumber: Maximum number of threads to use for taxonomy processing. | ||
3987 | 2504 | :type threadNumber: int | ||
3988 | 2505 | :returns: dictionary with resulting taxonomy for each taxon (keys) | ||
3989 | 2506 | :rtype: dictionary | ||
3990 | 2507 | """ | ||
3991 | 2508 | if verbose : | ||
3992 | 2509 | logging.getLogger().setLevel(logging.INFO) | ||
3993 | 2510 | if taxonomy is None: | ||
3994 | 2511 | taxonomy = {} | ||
3995 | 2512 | |||
3996 | 2513 | lock = threading.Lock() | ||
3997 | 2514 | queue = Queue.Queue() | ||
3998 | 2515 | |||
3999 | 2516 | #Starting a few threads as daemons checking the queue | ||
4000 | 2517 | for i in range(threadNumber) : | ||
4001 | 2518 | t = TaxonomyFetcher(taxonomy, lock, queue, i, pref_db, verbose, ignoreWarnings) | ||
4002 | 2519 | t.setDaemon(True) | ||
4003 | 2520 | t.start() | ||
4004 | 2521 | |||
4005 | 2522 | #Popoluate the queue with the taxa. | ||
4006 | 2523 | for taxon in taxa : | ||
4007 | 2524 | queue.put(taxon) | ||
4008 | 2525 | |||
4009 | 2526 | #Wait till everyone finishes | ||
4010 | 2527 | queue.join() | ||
4011 | 2528 | logging.getLogger().setLevel(logging.WARNING) | ||
4012 | 2529 | |||
4013 | 2530 | def create_taxonomy_from_tree(tree, existing_taxonomy=None, pref_db=None, verbose=False, ignoreWarnings=False): | ||
4014 | 2531 | """ Generates the taxonomy from a tree. Uses a similar method to the XML version but works directly on a string with the tree. | ||
4015 | 2532 | :param tree: list of the taxa. | ||
4016 | 2533 | :type tree : list | ||
4017 | 2534 | :param existing_taxonomy: list of the taxa. | ||
4018 | 2535 | :type existing_taxonomy: list | ||
4019 | 2536 | :param pref_db: Gives priority to database. Seems it is unused. | ||
4020 | 2537 | :type pref_db: string | ||
4021 | 2538 | :param verbose: Flag for verbosity. | ||
4022 | 2539 | :type verbose: boolean | ||
4023 | 2540 | :param ignoreWarnings: Flag for exception processing. | ||
4024 | 2541 | :type ignoreWarnings: boolean | ||
4025 | 2542 | :returns: the modified taxonomy | ||
4026 | 2543 | :rtype: dictionary | ||
4027 | 2544 | """ | ||
4028 | 2545 | starttime = time.time() | ||
4029 | 2546 | |||
4030 | 2547 | if(existing_taxonomy is None) : | ||
4031 | 2548 | taxonomy = {} | ||
4032 | 2549 | else : | ||
4033 | 2550 | taxonomy = existing_taxonomy | ||
4034 | 2551 | |||
4035 | 2552 | taxa = get_taxa_from_tree_for_taxonomy(tree, pretty=True) | ||
4036 | 2553 | |||
4037 | 2554 | create_taxonomy_from_taxa(taxa, taxonomy) | ||
4038 | 2555 | |||
4039 | 2556 | taxonomy = create_extended_taxonomy(taxonomy, starttime, verbose, ignoreWarnings) | ||
4040 | 2557 | |||
4041 | 2558 | return taxonomy | ||
4042 | 2559 | |||
4043 | 2560 | def create_taxonomy(XML, existing_taxonomy=None, pref_db=None, verbose=False, ignoreWarnings=False): | ||
4044 | 2561 | """Generates a taxonomy of the data from EoL data. This is stored as a | ||
4045 | 2562 | dictionary of taxonomy for each taxon in the dataset. Missing data are | ||
4046 | 2563 | encoded as '' (blank string). It's up to the calling function to store this | ||
4047 | 2564 | data to file or display it.""" | ||
4048 | 2565 | |||
4049 | 2566 | starttime = time.time() | ||
4050 | 2567 | |||
4051 | 2568 | if not ignoreWarnings: | ||
4052 | 2569 | _check_data(XML) | ||
4053 | 2570 | |||
4054 | 2571 | if (existing_taxonomy is None): | ||
4055 | 2572 | taxonomy = {} | ||
4056 | 2573 | else: | ||
4057 | 2574 | taxonomy = existing_taxonomy | ||
4058 | 2575 | taxa = get_all_taxa(XML, pretty=True) | ||
4059 | 2576 | create_taxonomy_from_taxa(taxa, taxonomy) | ||
4060 | 2577 | #taxonomy = create_extended_taxonomy(taxonomy, starttime, verbose, ignoreWarnings) | ||
4061 | 2578 | return taxonomy | ||
4062 | 2579 | |||
4063 | 2580 | def create_extended_taxonomy(taxonomy, starttime, verbose=False, ignoreWarnings=False): | ||
4064 | 2581 | """Bring extra taxonomy terms from other databases, shared method for completing the taxonomy | ||
4065 | 2582 | both for trees comming from XML or directly from trees. | ||
4066 | 2583 | :param taxonomy: Dictionary with the relationship for taxa and taxonomy terms. | ||
4067 | 2584 | :type taxonomy: dictionary | ||
4068 | 2585 | :param starttime: time to keep track of processing time. | ||
4069 | 2586 | :type starttime: long | ||
4070 | 2587 | :param verbose: Flag for verbosity. | ||
4071 | 2588 | :type verbose: boolean | ||
4072 | 2589 | :param ignoreWarnings: Flag for exception processing. | ||
4073 | 2590 | :type ignoreWarnings: boolean | ||
4074 | 2591 | :returns: the modified taxonomy | ||
4075 | 2592 | :rtype: dictionary | ||
4076 | 2593 | """ | ||
4077 | 2594 | |||
4078 | 2595 | if (verbose): | ||
4079 | 2596 | logging.info('Done basic taxonomy, getting more info from ITIS') | ||
4080 | 2597 | print("Time elapsed {}".format(str(time.time() - starttime))) | ||
4081 | 2598 | print "Done basic taxonomy, getting more info from ITIS" | ||
4082 | 2599 | # fill in the rest of the taxonomy | ||
4083 | 2600 | # get all genera | ||
4084 | 2601 | genera = [] | ||
4085 | 2602 | for t in taxonomy: | ||
4086 | 2603 | if t in taxonomy: | ||
4087 | 2604 | if GENUS in taxonomy[t]: | ||
4088 | 2605 | genera.append(taxonomy[t][GENUS]) | ||
4089 | 2606 | genera = _uniquify(genera) | ||
4090 | 2607 | # We then use ITIS to fill in missing info based on the genera only - that saves us a species level search | ||
4091 | 2608 | # and we can fill in most of the EoL missing data | ||
4092 | 2609 | for g in genera: | ||
4093 | 2610 | if (verbose): | ||
4094 | 2611 | print "Looking up ", g | ||
4095 | 2612 | logging.info("Looking up {}".format(str(g))) | ||
4096 | 2613 | try: | ||
4097 | 2614 | URL="http://www.itis.gov/ITISWebService/jsonservice/searchByScientificName?srchKey="+quote_plus(g.strip()) | ||
4098 | 2615 | except: | ||
4099 | 2616 | continue | ||
4100 | 2617 | req = urllib2.Request(URL) | ||
4101 | 2618 | opener = urllib2.build_opener() | ||
4102 | 2619 | try: | ||
4103 | 2620 | f = opener.open(req) | ||
4104 | 2621 | except urllib2.HTTPError: | ||
4105 | 2622 | continue | ||
4106 | 2623 | string = unicode(f.read(),"ISO-8859-1") | ||
4107 | 2624 | data = json.loads(string) | ||
4108 | 2625 | if data['scientificNames'][0] == None: | ||
4109 | 2626 | continue | ||
4110 | 2627 | tsn = data["scientificNames"][0]["tsn"] | ||
4111 | 2628 | URL="http://www.itis.gov/ITISWebService/jsonservice/getFullHierarchyFromTSN?tsn="+str(tsn) | ||
4112 | 2629 | req = urllib2.Request(URL) | ||
4113 | 2630 | opener = urllib2.build_opener() | ||
4114 | 2631 | f = opener.open(req) | ||
4115 | 2632 | try: | ||
4116 | 2633 | string = unicode(f.read(),"ISO-8859-1") | ||
4117 | 2634 | except: | ||
4118 | 2635 | continue | ||
4119 | 2636 | data = json.loads(string) | ||
4120 | 2637 | this_taxonomy = {} | ||
4121 | 2638 | for level in data['hierarchyList']: | ||
4122 | 2639 | if not level['rankName'].lower() in current_taxonomy_levels: | ||
4123 | 2640 | # note the dump into ASCII | ||
4124 | 2641 | if level['rankName'].lower() == 'species': | ||
4125 | 2642 | this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = ' '.join.level['taxonName'][0:2].encode("ascii","ignore") | ||
4126 | 2643 | else: | ||
4127 | 2644 | this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = level['taxonName'].encode("ascii","ignore") | ||
4128 | 2645 | |||
4129 | 2646 | for t in taxonomy: | ||
4130 | 2647 | if t in taxonomy: | ||
4131 | 2648 | if GENUS in taxonomy[t]: | ||
4132 | 2649 | if taxonomy[t][GENUS] == g: | ||
4133 | 2650 | taxonomy[t].update(this_taxonomy) | ||
4134 | 2651 | |||
4135 | 2652 | return taxonomy | ||
4136 | 2653 | |||
4137 | 2654 | def generate_species_level_data(XML, taxonomy, ignoreWarnings=False, verbose=False): | ||
4138 | 2655 | """ Based on a taxonomy data set, amend the data to be at species level as | ||
4139 | 2656 | far as possible. This function creates an internal 'subs file' and calls | ||
4140 | 2657 | the standard substitution functions. The internal subs are generated by | ||
4141 | 2658 | looping over the taxa and if not at species-level, working out which level | ||
4142 | 2659 | they are at and then adding species already in the dataset to replace it | ||
4143 | 2660 | via a polytomy. This has to be done in one step to avoid adding spurious | ||
4144 | 2661 | structure to the phylogenies """ | ||
4145 | 2662 | |||
4146 | 2663 | if not ignoreWarnings: | ||
4147 | 2664 | _check_data(XML) | ||
4148 | 2665 | |||
4149 | 2666 | # if taxonomic checker not done, warn | ||
4150 | 2667 | if (not taxonomy): | ||
4151 | 2668 | raise NoneCompleteTaxonomy("Taxonomy is empty. Create a taxonomy first. You'll probably need to hand edit the file to complete") | ||
4152 | 2669 | return | ||
4153 | 2670 | |||
4154 | 2671 | # if missing data in taxonomy, warn | ||
4155 | 2672 | taxa = get_all_taxa(XML) | ||
4156 | 2673 | keys = taxonomy.keys() | ||
4157 | 2674 | if (not ignoreWarnings): | ||
4158 | 2675 | for t in taxa: | ||
4159 | 2676 | t = t.replace("_"," ") | ||
4160 | 2677 | if not t in keys: | ||
4161 | 2678 | # This idea here is that the caller will catch this, then re-run with ignoreWarnings set to True | ||
4162 | 2679 | raise NoneCompleteTaxonomy("Taxonomy is not complete. I will soldier on anyway, but this might not work as intended") | ||
4163 | 2680 | |||
4164 | 2681 | # get all taxa - see above! | ||
4165 | 2682 | # for each taxa, if not at species level | ||
4166 | 2683 | new_taxa = [] | ||
4167 | 2684 | old_taxa = [] | ||
4168 | 2685 | for t in taxa: | ||
4169 | 2686 | subs = [] | ||
4170 | 2687 | t = t.replace("_"," ") | ||
4171 | 2688 | if (not SPECIES in taxonomy[t]): # the current taxon is not a species, but higher level taxon | ||
4172 | 2689 | # work out which level - should we encode this in the data to start with? | ||
4173 | 2690 | for tl in taxonomy_levels: | ||
4174 | 2691 | try: | ||
4175 | 2692 | tax_data = taxonomy[t][tl] | ||
4176 | 2693 | except KeyError: | ||
4177 | 2694 | continue | ||
4178 | 2695 | if (t == taxonomy[t][tl]): | ||
4179 | 2696 | current_level = tl | ||
4180 | 2697 | # find all species in the taxonomy that match this level | ||
4181 | 2698 | for taxon in taxa: | ||
4182 | 2699 | taxon = taxon.replace("_"," ") | ||
4183 | 2700 | if (SPECIES in taxonomy[taxon]): | ||
4184 | 2701 | try: | ||
4185 | 2702 | if taxonomy[taxon][current_level] == t: # our current taxon | ||
4186 | 2703 | subs.append(taxon.replace(" ","_")) | ||
4187 | 2704 | except KeyError: | ||
4188 | 2705 | continue | ||
4189 | 2706 | |||
4190 | 2707 | # create the sub | ||
4191 | 2708 | if len(subs) > 0: | ||
4192 | 2709 | old_taxa.append(t.replace(" ","_")) | ||
4193 | 2710 | new_taxa.append(','.join(subs)) | ||
4194 | 2711 | |||
4195 | 2712 | # call the sub | ||
4196 | 2713 | new_XML = substitute_taxa(XML, old_taxa, new_taxa, verbose=verbose) | ||
4197 | 2714 | new_XML = clean_data(new_XML) | ||
4198 | 2715 | |||
4199 | 2716 | return new_XML | ||
4200 | 2017 | 2717 | ||
4201 | 2018 | def data_overlap(XML, overlap_amount=2, filename=None, detailed=False, show=False, verbose=False, ignoreWarnings=False): | 2718 | def data_overlap(XML, overlap_amount=2, filename=None, detailed=False, show=False, verbose=False, ignoreWarnings=False): |
4202 | 2019 | """ Calculate the amount of taxonomic overlap between source trees. | 2719 | """ Calculate the amount of taxonomic overlap between source trees. |
4203 | @@ -2024,7 +2724,7 @@ | |||
4204 | 2024 | If filename is None, no graphic is generated. Otherwise a simple | 2724 | If filename is None, no graphic is generated. Otherwise a simple |
4205 | 2025 | graphic is generated showing the number of cluster. If detailed is set to | 2725 | graphic is generated showing the number of cluster. If detailed is set to |
4206 | 2026 | true, a graphic is generated showing *all* trees. For data containing >200 | 2726 | true, a graphic is generated showing *all* trees. For data containing >200 |
4208 | 2027 | source tres this could be very big and take along time. More likely, you'll run | 2727 | source trees this could be very big and take along time. More likely, you'll run |
4209 | 2028 | out of memory. | 2728 | out of memory. |
4210 | 2029 | """ | 2729 | """ |
4211 | 2030 | import matplotlib | 2730 | import matplotlib |
4212 | @@ -2103,6 +2803,7 @@ | |||
4213 | 2103 | sufficient_overlap = True | 2803 | sufficient_overlap = True |
4214 | 2104 | 2804 | ||
4215 | 2105 | # The above list actually contains which components are seperate from each other | 2805 | # The above list actually contains which components are seperate from each other |
4216 | 2806 | key_list = connected_components | ||
4217 | 2106 | 2807 | ||
4218 | 2107 | if (not filename == None or show): | 2808 | if (not filename == None or show): |
4219 | 2108 | if (verbose): | 2809 | if (verbose): |
4220 | @@ -2266,7 +2967,9 @@ | |||
4221 | 2266 | prev_char = None | 2967 | prev_char = None |
4222 | 2267 | prev_taxa = None | 2968 | prev_taxa = None |
4223 | 2268 | prev_name = None | 2969 | prev_name = None |
4225 | 2269 | non_ind = {} | 2970 | subsets = [] |
4226 | 2971 | identical = [] | ||
4227 | 2972 | is_identical = False | ||
4228 | 2270 | for data in data_ind: | 2973 | for data in data_ind: |
4229 | 2271 | name = data[0] | 2974 | name = data[0] |
4230 | 2272 | char = data[1] | 2975 | char = data[1] |
4231 | @@ -2275,22 +2978,71 @@ | |||
4232 | 2275 | # when sorted, the longer list comes first | 2978 | # when sorted, the longer list comes first |
4233 | 2276 | if set(taxa).issubset(set(prev_taxa)): | 2979 | if set(taxa).issubset(set(prev_taxa)): |
4234 | 2277 | if (taxa == prev_taxa): | 2980 | if (taxa == prev_taxa): |
4236 | 2278 | non_ind[name] = [prev_name,IDENTICAL] | 2981 | if (is_identical): |
4237 | 2982 | identical[-1].append(name) | ||
4238 | 2983 | else: | ||
4239 | 2984 | identical.append([name,prev_name]) | ||
4240 | 2985 | is_identical = True | ||
4241 | 2986 | |||
4242 | 2279 | else: | 2987 | else: |
4244 | 2280 | non_ind[name] = [prev_name,SUBSET] | 2988 | subsets.append([prev_name, name]) |
4245 | 2989 | prev_name = name | ||
4246 | 2990 | is_identical = False | ||
4247 | 2991 | else: | ||
4248 | 2992 | prev_name = name | ||
4249 | 2993 | is_identical = False | ||
4250 | 2994 | else: | ||
4251 | 2995 | prev_name = name | ||
4252 | 2996 | is_identical = False | ||
4253 | 2997 | |||
4254 | 2281 | prev_char = char | 2998 | prev_char = char |
4255 | 2282 | prev_taxa = taxa | 2999 | prev_taxa = taxa |
4258 | 2283 | prev_name = name | 3000 | |
4257 | 2284 | |||
4259 | 2285 | if (make_new_xml): | 3001 | if (make_new_xml): |
4260 | 2286 | new_xml = XML | 3002 | new_xml = XML |
4264 | 2287 | for name in non_ind: | 3003 | # deal with subsets |
4265 | 2288 | if (non_ind[name][1] == SUBSET): | 3004 | for s in subsets: |
4266 | 2289 | new_xml = _swap_tree_in_XML(new_xml,None,name) | 3005 | new_xml = _swap_tree_in_XML(new_xml,None,s[1]) |
4267 | 2290 | new_xml = clean_data(new_xml) | 3006 | new_xml = clean_data(new_xml) |
4269 | 2291 | return non_ind, new_xml | 3007 | # deal with identical - weight them, if there's 3, weights are 0.3, i.e. |
4270 | 3008 | # weights are 1/no of identical trees | ||
4271 | 3009 | for i in identical: | ||
4272 | 3010 | weight = 1.0 / float(len(i)) | ||
4273 | 3011 | new_xml = add_weights(new_xml, i, weight) | ||
4274 | 3012 | |||
4275 | 3013 | return identical, subsets, new_xml | ||
4276 | 2292 | else: | 3014 | else: |
4278 | 2293 | return non_ind | 3015 | return identical, subsets |
4279 | 3016 | |||
4280 | 3017 | |||
4281 | 3018 | def add_weights(XML, names, weight): | ||
4282 | 3019 | """ Add weights for tree, supply array of names and a weight, they get set | ||
4283 | 3020 | Returns a new XML | ||
4284 | 3021 | """ | ||
4285 | 3022 | |||
4286 | 3023 | xml_root = _parse_xml(XML) | ||
4287 | 3024 | # By getting source, we can then loop over each source_tree | ||
4288 | 3025 | find = etree.XPath("//source_tree") | ||
4289 | 3026 | sources = find(xml_root) | ||
4290 | 3027 | for s in sources: | ||
4291 | 3028 | s_name = s.attrib['name'] | ||
4292 | 3029 | for n in names: | ||
4293 | 3030 | if s_name == n: | ||
4294 | 3031 | if s.xpath("tree/weight/real_value") == []: | ||
4295 | 3032 | # add weights | ||
4296 | 3033 | weights_element = etree.Element("weight") | ||
4297 | 3034 | weights_element.tail="\n" | ||
4298 | 3035 | real_value = etree.SubElement(weights_element,'real_value') | ||
4299 | 3036 | real_value.attrib['rank'] = '0' | ||
4300 | 3037 | real_value.tail = '\n' | ||
4301 | 3038 | real_value.text = str(weight) | ||
4302 | 3039 | t = s.xpath("tree")[0] | ||
4303 | 3040 | t.append(weights_element) | ||
4304 | 3041 | else: | ||
4305 | 3042 | s.xpath("tree/weight/real_value")[0].text = str(weight) | ||
4306 | 3043 | |||
4307 | 3044 | return etree.tostring(xml_root,pretty_print=True) | ||
4308 | 3045 | |||
4309 | 2294 | 3046 | ||
4310 | 2295 | def add_historical_event(XML, event_description): | 3047 | def add_historical_event(XML, event_description): |
4311 | 2296 | """ | 3048 | """ |
4312 | @@ -2380,8 +3132,15 @@ | |||
4313 | 2380 | # check trees are informative | 3132 | # check trees are informative |
4314 | 2381 | XML = _check_informative_trees(XML,delete=True) | 3133 | XML = _check_informative_trees(XML,delete=True) |
4315 | 2382 | 3134 | ||
4316 | 3135 | |||
4317 | 2383 | # check sources | 3136 | # check sources |
4318 | 2384 | XML = _check_sources(XML,delete=True) | 3137 | XML = _check_sources(XML,delete=True) |
4319 | 3138 | XML = all_sourcenames(XML) | ||
4320 | 3139 | |||
4321 | 3140 | # fix tree names | ||
4322 | 3141 | XML = set_unique_names(XML) | ||
4323 | 3142 | XML = set_all_tree_names(XML,overwrite=True) | ||
4324 | 3143 | |||
4325 | 2385 | 3144 | ||
4326 | 2386 | # unpermutable trees | 3145 | # unpermutable trees |
4327 | 2387 | permutable_trees = _find_trees_for_permuting(XML) | 3146 | permutable_trees = _find_trees_for_permuting(XML) |
4328 | @@ -2659,7 +3418,7 @@ | |||
4329 | 2659 | s.getparent().remove(s) | 3418 | s.getparent().remove(s) |
4330 | 2660 | 3419 | ||
4331 | 2661 | # edit name (append _subset) | 3420 | # edit name (append _subset) |
4333 | 2662 | proj_name = xml_root.xpath('/phylo_storage/project_name/string_value')[0].text | 3421 | proj_name = get_project_name(XML) |
4334 | 2663 | proj_name += "_subset" | 3422 | proj_name += "_subset" |
4335 | 2664 | xml_root.xpath('/phylo_storage/project_name/string_value')[0].text = proj_name | 3423 | xml_root.xpath('/phylo_storage/project_name/string_value')[0].text = proj_name |
4336 | 2665 | 3424 | ||
4337 | @@ -2928,6 +3687,37 @@ | |||
4338 | 2928 | 3687 | ||
4339 | 2929 | return mrca | 3688 | return mrca |
4340 | 2930 | 3689 | ||
4341 | 3690 | |||
4342 | 3691 | def tree_from_taxonomy(taxonomy, end_level, end_rank): | ||
4343 | 3692 | """Create a tree from a taxonomy data structure. | ||
4344 | 3693 | This is not the most efficient way, but works OK | ||
4345 | 3694 | """ | ||
4346 | 3695 | |||
4347 | 3696 | # Grab data only for the end_level classification | ||
4348 | 3697 | required_taxonomy = {} | ||
4349 | 3698 | for t in taxonomy: | ||
4350 | 3699 | if (end_level in t): | ||
4351 | 3700 | required_taxonomy[t] = taxonomy[t] | ||
4352 | 3701 | |||
4353 | 3702 | rank_index = taxonomy_levels.index(end_rank) | ||
4354 | 3703 | |||
4355 | 3704 | # create basic string | ||
4356 | 3705 | |||
4357 | 3706 | # get unique otus | ||
4358 | 3707 | |||
4359 | 3708 | # sort by the subfamily | ||
4360 | 3709 | |||
4361 | 3710 | # for each genus create a newick string | ||
4362 | 3711 | |||
4363 | 3712 | # if it's the same grouping as previous, add as sister clade (i.e. ,) | ||
4364 | 3713 | # else, prepend a (, append a ) and add new clade (ie. ,) | ||
4365 | 3714 | |||
4366 | 3715 | |||
4367 | 3716 | # return tree | ||
4368 | 3717 | |||
4369 | 3718 | |||
4370 | 3719 | |||
4371 | 3720 | |||
4372 | 2931 | ################ PRIVATE FUNCTIONS ######################## | 3721 | ################ PRIVATE FUNCTIONS ######################## |
4373 | 2932 | 3722 | ||
4374 | 2933 | def _uniquify(l): | 3723 | def _uniquify(l): |
4375 | @@ -2975,13 +3765,25 @@ | |||
4376 | 2975 | "The source names in the dataset are not unique. Please run the auto-name function on these data. Name: "+name+"\n" | 3765 | "The source names in the dataset are not unique. Please run the auto-name function on these data. Name: "+name+"\n" |
4377 | 2976 | last_name = name | 3766 | last_name = name |
4378 | 2977 | 3767 | ||
4379 | 3768 | # do same for tree names: | ||
4380 | 3769 | names = get_all_tree_names(XML) | ||
4381 | 3770 | names.sort() | ||
4382 | 3771 | last_name = "" # This will actually throw an non-unique error if a name is empty | ||
4383 | 3772 | # not great, but still an error! | ||
4384 | 3773 | for name in names: | ||
4385 | 3774 | if name == last_name: | ||
4386 | 3775 | # if non-unique throw exception | ||
4387 | 3776 | message = message + \ | ||
4388 | 3777 | "The tree names in the dataset are not unique. Please run the auto-name function on these data with replace or edit by hand. Name: "+name+"\n" | ||
4389 | 3778 | last_name = name | ||
4390 | 3779 | |||
4391 | 2978 | if (not message == ""): | 3780 | if (not message == ""): |
4392 | 2979 | raise NotUniqueError(message) | 3781 | raise NotUniqueError(message) |
4393 | 2980 | 3782 | ||
4394 | 2981 | return | 3783 | return |
4395 | 2982 | 3784 | ||
4396 | 2983 | 3785 | ||
4398 | 2984 | def _assemble_tree_matrix(tree_string): | 3786 | def _assemble_tree_matrix(tree_string, verbose=False): |
4399 | 2985 | """ Assembles the MRP matrix for an individual tree | 3787 | """ Assembles the MRP matrix for an individual tree |
4400 | 2986 | 3788 | ||
4401 | 2987 | returns: matrix (2D numpy array: taxa on i, nodes on j) | 3789 | returns: matrix (2D numpy array: taxa on i, nodes on j) |
4402 | @@ -3009,7 +3811,7 @@ | |||
4403 | 3009 | for i in range(0,len(names)): | 3811 | for i in range(0,len(names)): |
4404 | 3010 | adjmat.append([1]) | 3812 | adjmat.append([1]) |
4405 | 3011 | adjmat = numpy.array(adjmat) | 3813 | adjmat = numpy.array(adjmat) |
4407 | 3012 | 3814 | if verbose: | |
4408 | 3013 | print "Warning: Found uninformative tree in data. Including it in the matrix anyway" | 3815 | print "Warning: Found uninformative tree in data. Including it in the matrix anyway" |
4409 | 3014 | 3816 | ||
4410 | 3015 | return adjmat, names | 3817 | return adjmat, names |
4411 | @@ -3020,7 +3822,7 @@ | |||
4412 | 3020 | 3822 | ||
4413 | 3021 | If the new_taxa array is missing, simply delete the old_taxa | 3823 | If the new_taxa array is missing, simply delete the old_taxa |
4414 | 3022 | """ | 3824 | """ |
4416 | 3023 | 3825 | ||
4417 | 3024 | tree = _correctly_quote_taxa(tree) | 3826 | tree = _correctly_quote_taxa(tree) |
4418 | 3025 | # are the input values lists or simple strings? | 3827 | # are the input values lists or simple strings? |
4419 | 3026 | if (isinstance(old_taxa,str)): | 3828 | if (isinstance(old_taxa,str)): |
4420 | @@ -3564,7 +4366,7 @@ | |||
4421 | 3564 | 4366 | ||
4422 | 3565 | return permute_trees | 4367 | return permute_trees |
4423 | 3566 | 4368 | ||
4425 | 3567 | def _create_matrix(trees, taxa, format="hennig", quote=False, weights=None): | 4369 | def _create_matrix(trees, taxa, format="hennig", quote=False, weights=None, verbose=False): |
4426 | 3568 | """ | 4370 | """ |
4427 | 3569 | Does the hard work on creating a matrix | 4371 | Does the hard work on creating a matrix |
4428 | 3570 | """ | 4372 | """ |
4429 | @@ -3585,7 +4387,7 @@ | |||
4430 | 3585 | if (not weights == None): | 4387 | if (not weights == None): |
4431 | 3586 | weight = weights[key] | 4388 | weight = weights[key] |
4432 | 3587 | names.append(key) | 4389 | names.append(key) |
4434 | 3588 | submatrix, tree_taxa = _assemble_tree_matrix(trees[key]) | 4390 | submatrix, tree_taxa = _assemble_tree_matrix(trees[key], verbose=verbose) |
4435 | 3589 | nChars = len(submatrix[0,:]) | 4391 | nChars = len(submatrix[0,:]) |
4436 | 3590 | # loop over characters in the submatrix | 4392 | # loop over characters in the submatrix |
4437 | 3591 | for i in range(1,nChars): | 4393 | for i in range(1,nChars): |
4438 | @@ -3637,7 +4439,7 @@ | |||
4439 | 3637 | matrix_string += string + "\n" | 4439 | matrix_string += string + "\n" |
4440 | 3638 | i += 1 | 4440 | i += 1 |
4441 | 3639 | 4441 | ||
4443 | 3640 | matrix_string += "\t;\n" | 4442 | matrix_string += "\n" |
4444 | 3641 | if (not weights == None): | 4443 | if (not weights == None): |
4445 | 3642 | # get unique weights | 4444 | # get unique weights |
4446 | 3643 | unique_weights = _uniquify(weights) | 4445 | unique_weights = _uniquify(weights) |
4447 | @@ -3652,7 +4454,7 @@ | |||
4448 | 3652 | matrix_string += " " + str(i) | 4454 | matrix_string += " " + str(i) |
4449 | 3653 | i += 1 | 4455 | i += 1 |
4450 | 3654 | matrix_string += ";\n" | 4456 | matrix_string += ";\n" |
4452 | 3655 | matrix_string += "procedure /;" | 4457 | matrix_string += "proc /;" |
4453 | 3656 | elif (format == 'nexus'): | 4458 | elif (format == 'nexus'): |
4454 | 3657 | matrix_string = "#nexus\n\nbegin data;\n" | 4459 | matrix_string = "#nexus\n\nbegin data;\n" |
4455 | 3658 | matrix_string += "\tdimensions ntax = "+str(len(taxa)) +" nchar = "+str(last_char)+";\n" | 4460 | matrix_string += "\tdimensions ntax = "+str(len(taxa)) +" nchar = "+str(last_char)+";\n" |
4456 | 3659 | 4461 | ||
4457 | === modified file 'stk/test/_substitute_taxa.py' | |||
4458 | --- stk/test/_substitute_taxa.py 2016-07-14 10:12:17 +0000 | |||
4459 | +++ stk/test/_substitute_taxa.py 2017-01-12 09:27:31 +0000 | |||
4460 | @@ -10,6 +10,7 @@ | |||
4461 | 10 | from stk.supertree_toolkit import check_subs, _tree_contains, _correctly_quote_taxa, _remove_single_poly_taxa | 10 | from stk.supertree_toolkit import check_subs, _tree_contains, _correctly_quote_taxa, _remove_single_poly_taxa |
4462 | 11 | from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_all_taxa, _parse_tree, _delete_taxon | 11 | from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_all_taxa, _parse_tree, _delete_taxon |
4463 | 12 | from stk.supertree_toolkit import _collapse_nodes, import_tree, subs_from_csv, _getTaxaFromNewick, obtain_trees | 12 | from stk.supertree_toolkit import _collapse_nodes, import_tree, subs_from_csv, _getTaxaFromNewick, obtain_trees |
4464 | 13 | from stk.supertree_toolkit import generate_species_level_data | ||
4465 | 13 | from lxml import etree | 14 | from lxml import etree |
4466 | 14 | from util import * | 15 | from util import * |
4467 | 15 | from stk.stk_exceptions import * | 16 | from stk.stk_exceptions import * |
4468 | @@ -776,7 +777,24 @@ | |||
4469 | 776 | new_tree = _sub_taxa_in_tree(tree2,"Thereuopodina",sub_in,skip_existing=True); | 777 | new_tree = _sub_taxa_in_tree(tree2,"Thereuopodina",sub_in,skip_existing=True); |
4470 | 777 | self.assert_(answer2, new_tree) | 778 | self.assert_(answer2, new_tree) |
4471 | 778 | 779 | ||
4473 | 779 | 780 | ||
4474 | 781 | def test_auto_subs_taxonomy(self): | ||
4475 | 782 | """test the automatic subs function with a simple test""" | ||
4476 | 783 | XML = etree.tostring(etree.parse('data/input/auto_sub.phyml',parser),pretty_print=True) | ||
4477 | 784 | taxonomy = {'Ardea goliath': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Ecdysozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Ardea', 'order': 'Pelecaniformes', 'species': 'Ardea goliath'}, | ||
4478 | 785 | 'Pelecaniformes': {'kingdom': 'Animalia', 'phylum': 'Chordata', 'order': 'Pelecaniformes', 'class': 'Aves', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013'}, 'Gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'order': 'Galliformes'}, | ||
4479 | 786 | 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'infraphylum': 'Gnathostomata', 'superclass': 'Tetrapoda', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Thalassarche', 'order': 'Procellariiformes', 'species': 'Thalassarche melanophris'}, | ||
4480 | 787 | 'Platalea leucorodia': {'kingdom': 'Animalia', 'subfamily': 'Plataleinae', 'family': 'Threskiornithidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'infraphylum': 'Gnathostomata', 'superclass': 'Tetrapoda', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Platalea', 'order': 'Pelecaniformes', 'species': 'Platalea leucorodia'}, | ||
4481 | 788 | 'Gallus lafayetii': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'order': 'Galliformes', 'species': 'Gallus lafayetii'}, | ||
4482 | 789 | 'Ardea humbloti': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Ecdysozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Ardea', 'order': 'Pelecaniformes', 'species': 'Ardea humbloti'}, | ||
4483 | 790 | 'Gallus varius': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'order': 'Galliformes', 'species': 'Gallus varius'}} | ||
4484 | 791 | XML = generate_species_level_data(XML, taxonomy) | ||
4485 | 792 | expected_XML = etree.tostring(etree.parse('data/output/one_click_subs_output.phyml',parser),pretty_print=True) | ||
4486 | 793 | trees = obtain_trees(XML) | ||
4487 | 794 | expected_trees = obtain_trees(expected_XML) | ||
4488 | 795 | for t in trees: | ||
4489 | 796 | self.assert_(_trees_equal(trees[t], expected_trees[t])) | ||
4490 | 797 | |||
4491 | 780 | def test_parrot_edge_case(self): | 798 | def test_parrot_edge_case(self): |
4492 | 781 | """Random edge case where the tree dissappeared...""" | 799 | """Random edge case where the tree dissappeared...""" |
4493 | 782 | trees = ["(((((((Agapornis_lilianae, Agapornis_nigrigenis), Agapornis_personata, Agapornis_fischeri), Agapornis_roseicollis), (Agapornis_pullaria, Agapornis_taranta)), Agapornis_cana), Loriculus_galgulus), Geopsittacus_occidentalis);"] | 800 | trees = ["(((((((Agapornis_lilianae, Agapornis_nigrigenis), Agapornis_personata, Agapornis_fischeri), Agapornis_roseicollis), (Agapornis_pullaria, Agapornis_taranta)), Agapornis_cana), Loriculus_galgulus), Geopsittacus_occidentalis);"] |
4494 | 783 | 801 | ||
4495 | === modified file 'stk/test/_supertree_toolkit.py' | |||
4496 | --- stk/test/_supertree_toolkit.py 2015-03-26 09:58:58 +0000 | |||
4497 | +++ stk/test/_supertree_toolkit.py 2017-01-12 09:27:31 +0000 | |||
4498 | @@ -7,12 +7,13 @@ | |||
4499 | 7 | import os | 7 | import os |
4500 | 8 | stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir, os.pardir ) | 8 | stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir, os.pardir ) |
4501 | 9 | sys.path.insert(0, stk_path) | 9 | sys.path.insert(0, stk_path) |
4503 | 10 | from stk.supertree_toolkit import _check_uniqueness, _check_taxa, _check_data, get_all_characters, data_independence | 10 | from stk.supertree_toolkit import _check_uniqueness, _check_taxa, _check_data, get_all_characters, data_independence, add_weights |
4504 | 11 | from stk.supertree_toolkit import get_fossil_taxa, get_publication_years, data_summary, get_character_numbers, get_analyses_used | 11 | from stk.supertree_toolkit import get_fossil_taxa, get_publication_years, data_summary, get_character_numbers, get_analyses_used |
4505 | 12 | from stk.supertree_toolkit import data_overlap, read_matrix, subs_file_from_str, clean_data, obtain_trees, get_all_source_names | 12 | from stk.supertree_toolkit import data_overlap, read_matrix, subs_file_from_str, clean_data, obtain_trees, get_all_source_names |
4506 | 13 | from stk.supertree_toolkit import add_historical_event, _sort_data, _parse_xml, _check_sources, _swap_tree_in_XML, replace_genera | 13 | from stk.supertree_toolkit import add_historical_event, _sort_data, _parse_xml, _check_sources, _swap_tree_in_XML, replace_genera |
4507 | 14 | from stk.supertree_toolkit import get_all_taxa, _get_all_siblings, _parse_tree, get_characters_used, _trees_equal, get_weights | 14 | from stk.supertree_toolkit import get_all_taxa, _get_all_siblings, _parse_tree, get_characters_used, _trees_equal, get_weights |
4509 | 15 | from stk.supertree_toolkit import get_outgroup, set_all_tree_names, create_tree_name, load_taxonomy | 15 | from stk.supertree_toolkit import get_outgroup, set_all_tree_names, create_tree_name, taxonomic_checker, load_taxonomy, load_equivalents |
4510 | 16 | from stk.supertree_toolkit import create_taxonomy, create_taxonomy_from_tree, get_all_tree_names | ||
4511 | 16 | from lxml import etree | 17 | from lxml import etree |
4512 | 17 | from util import * | 18 | from util import * |
4513 | 18 | from stk.stk_exceptions import * | 19 | from stk.stk_exceptions import * |
4514 | @@ -268,19 +269,52 @@ | |||
4515 | 268 | 269 | ||
4516 | 269 | def test_data_independence(self): | 270 | def test_data_independence(self): |
4517 | 270 | XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True) | 271 | XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True) |
4521 | 271 | expected_dict = {'Hill_2011_2': ['Hill_2011_1', 1], 'Hill_Davis_2011_1': ['Hill_Davis_2011_2', 0]} | 272 | expected_idents = [['Hill_Davis_2011_2', 'Hill_Davis_2011_1', 'Hill_Davis_2011_3'], ['Hill_Davis_2013_1', 'Hill_Davis_2013_2']] |
4522 | 272 | non_ind = data_independence(XML) | 273 | non_ind,subsets = data_independence(XML) |
4523 | 273 | self.assertDictEqual(expected_dict, non_ind) | 274 | expected_subsets = [['Hill_2011_1', 'Hill_2011_2']] |
4524 | 275 | self.assertListEqual(expected_subsets, subsets) | ||
4525 | 276 | self.assertListEqual(expected_idents, non_ind) | ||
4526 | 274 | 277 | ||
4528 | 275 | def test_data_independence(self): | 278 | def test_data_independence_2(self): |
4529 | 276 | XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True) | 279 | XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True) |
4533 | 277 | expected_dict = {'Hill_2011_2': ['Hill_2011_1', 1], 'Hill_Davis_2011_1': ['Hill_Davis_2011_2', 0]} | 280 | expected_idents = [['Hill_Davis_2011_2', 'Hill_Davis_2011_1', 'Hill_Davis_2011_3'], ['Hill_Davis_2013_1', 'Hill_Davis_2013_2']] |
4534 | 278 | non_ind, new_xml = data_independence(XML,make_new_xml=True) | 281 | expected_subsets = [['Hill_2011_1', 'Hill_2011_2']] |
4535 | 279 | self.assertDictEqual(expected_dict, non_ind) | 282 | non_ind, subset, new_xml = data_independence(XML,make_new_xml=True) |
4536 | 283 | self.assertListEqual(expected_idents, non_ind) | ||
4537 | 284 | self.assertListEqual(expected_subsets, subset) | ||
4538 | 280 | # check the second tree has not been removed | 285 | # check the second tree has not been removed |
4539 | 281 | self.assertRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,H:1.00000)0.00000:0.00000)0.00000:0.00000;')) | 286 | self.assertRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,H:1.00000)0.00000:0.00000)0.00000:0.00000;')) |
4540 | 282 | # check that the first tree is removed | 287 | # check that the first tree is removed |
4541 | 283 | self.assertNotRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,(F:1.00000,E:1.00000)0.00000:0.00000)0.00000:0.00000;')) | 288 | self.assertNotRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,(F:1.00000,E:1.00000)0.00000:0.00000)0.00000:0.00000;')) |
4542 | 289 | |||
4543 | 290 | def test_add_weights(self): | ||
4544 | 291 | """Add weights to a bunch of trees""" | ||
4545 | 292 | XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True) | ||
4546 | 293 | # see above | ||
4547 | 294 | expected_idents = [['Hill_Davis_2011_2', 'Hill_Davis_2011_1', 'Hill_Davis_2011_3'], ['Hill_Davis_2013_1', 'Hill_Davis_2013_2']] | ||
4548 | 295 | # so the first should end up with a weight of 0.33333 and the second with 0.5 | ||
4549 | 296 | for ei in expected_idents: | ||
4550 | 297 | weight = 1.0/float(len(ei)) | ||
4551 | 298 | XML = add_weights(XML, ei, weight) | ||
4552 | 299 | |||
4553 | 300 | expected_weights = [str(1.0/3.0), str(1.0/3.0), str(1.0/3.0), str(0.5), str(0.5)] | ||
4554 | 301 | weights_in_xml = [] | ||
4555 | 302 | # now check weights have been added to the correct part of the tree | ||
4556 | 303 | xml_root = _parse_xml(XML) | ||
4557 | 304 | i = 0 | ||
4558 | 305 | for ei in expected_idents: | ||
4559 | 306 | for tree in ei: | ||
4560 | 307 | find = etree.XPath("//source_tree") | ||
4561 | 308 | trees = find(xml_root) | ||
4562 | 309 | for t in trees: | ||
4563 | 310 | if t.attrib['name'] == tree: | ||
4564 | 311 | # check len(trees) == 0 | ||
4565 | 312 | weights_in_xml.append(t.xpath("tree/weight/real_value")[0].text) | ||
4566 | 313 | |||
4567 | 314 | self.assertListEqual(expected_weights,weights_in_xml) | ||
4568 | 315 | |||
4569 | 316 | |||
4570 | 317 | |||
4571 | 284 | 318 | ||
4572 | 285 | def test_overlap(self): | 319 | def test_overlap(self): |
4573 | 286 | XML = etree.tostring(etree.parse('data/input/check_overlap_ok.phyml',parser),pretty_print=True) | 320 | XML = etree.tostring(etree.parse('data/input/check_overlap_ok.phyml',parser),pretty_print=True) |
4574 | @@ -438,7 +472,7 @@ | |||
4575 | 438 | XML = clean_data(XML) | 472 | XML = clean_data(XML) |
4576 | 439 | trees = obtain_trees(XML) | 473 | trees = obtain_trees(XML) |
4577 | 440 | self.assert_(len(trees) == 2) | 474 | self.assert_(len(trees) == 2) |
4579 | 441 | expected_trees = {'Hill_2011_4': '(A,B,(C,D,E));', 'Hill_2011_2': '(A, B, C, (D, E, F));'} | 475 | expected_trees = {'Hill_2011_2': '(A,B,(C,D,E));', 'Hill_2011_1': '(A, B, C, (D, E, F));'} |
4580 | 442 | for t in trees: | 476 | for t in trees: |
4581 | 443 | self.assert_(_trees_equal(trees[t],expected_trees[t])) | 477 | self.assert_(_trees_equal(trees[t],expected_trees[t])) |
4582 | 444 | 478 | ||
4583 | @@ -558,18 +592,78 @@ | |||
4584 | 558 | self.assert_(c in expected_characters) | 592 | self.assert_(c in expected_characters) |
4585 | 559 | self.assert_(len(characters) == len(expected_characters)) | 593 | self.assert_(len(characters) == len(expected_characters)) |
4586 | 560 | 594 | ||
4587 | 595 | def test_create_taxonomy(self): | ||
4588 | 596 | XML = etree.tostring(etree.parse('data/input/create_taxonomy.phyml',parser),pretty_print=True) | ||
4589 | 597 | # Tested on 11/01/17 and EOL have changed the output | ||
4590 | 598 | # old_expected = {'Archaeopteryx lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'superclass': 'Sarcopterygii', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Egretta tricolor', 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Jeletzkytes criptonodosus': {'superfamily': 'Scaphitoidea', 'family': 'Scaphitidae', 'subkingdom': 'Metazoa', 'subclass': 'Ammonoidea', 'species': 'Jeletzkytes criptonodosus', 'phylum': 'Mollusca', 'suborder': 'Ancyloceratina', 'provider': 'Paleobiology Database', 'genus': 'Jeletzkytes', 'class': 'Cephalopoda'}} | ||
4591 | 599 | expected = {'Jeletzkytes criptonodosus': {'superfamily': 'Scaphitoidea', 'family': 'Scaphitidae', 'subkingdom': 'Metazoa', 'subclass': 'Ammonoidea', 'species': 'Jeletzkytes criptonodosus', 'phylum': 'Mollusca', 'suborder': 'Ancyloceratina', 'provider': 'Paleobiology Database', 'genus': 'Jeletzkytes', 'class': 'Cephalopoda'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'infraspecies': 'Egretta', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': ['Egretta', 'tricolor'], 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Archaeopteryx lithographica': {'genus': 'Archaeopteryx', 'provider': 'Paleobiology Database'}} | ||
4592 | 600 | if (internet_on()): | ||
4593 | 601 | taxonomy = create_taxonomy(XML) | ||
4594 | 602 | self.maxDiff = None | ||
4595 | 603 | self.assertDictEqual(taxonomy, expected) | ||
4596 | 604 | else: | ||
4597 | 605 | print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the taxonomy_checker function" | ||
4598 | 606 | return | ||
4599 | 607 | |||
4600 | 608 | def test_create_taxonomy_from_tree(self): | ||
4601 | 609 | """Tests if taxonomy from tree works. Uses same data for normal XML test but goes directly for the tree instead of parsing the XML """ | ||
4602 | 610 | # Tested on 11/01/17 and this no longer worked, but is correct! EOL returned something different. | ||
4603 | 611 | #old_expected = {'Archaeopteryx lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'superclass': 'Sarcopterygii', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Egretta tricolor', 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}} | ||
4604 | 612 | expected = {'Archaeopteryx lithographica': {'genus': 'Archaeopteryx', 'provider': 'Paleobiology Database'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'infraspecies': 'Egretta', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': ['Egretta', 'tricolor'], 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}} | ||
4605 | 613 | tree = "(Archaeopteryx_lithographica, (Gallus_gallus, (Thalassarche_melanophris, Egretta_tricolor)));" | ||
4606 | 614 | if (internet_on()): | ||
4607 | 615 | taxonomy = create_taxonomy_from_tree(tree) | ||
4608 | 616 | self.maxDiff = None | ||
4609 | 617 | self.assertDictEqual(taxonomy, expected) | ||
4610 | 618 | else: | ||
4611 | 619 | print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the create_taxonomy function" | ||
4612 | 620 | return | ||
4613 | 621 | |||
4614 | 622 | def test_taxonomy_checker(self): | ||
4615 | 623 | expected = {'Thalassarche_melanophrys': [['Thalassarche_melanophris', 'Thalassarche_melanophrys', 'Diomedea_melanophris', 'Thalassarche_[melanophrys', 'Diomedea_melanophrys'], 'amber'], 'Egretta_tricolor': [['Egretta_tricolor'], 'green'], 'Gallus_gallus': [['Gallus_gallus'], 'green']} | ||
4616 | 624 | XML = etree.tostring(etree.parse('data/input/check_taxonomy.phyml',parser),pretty_print=True) | ||
4617 | 625 | if (internet_on()): | ||
4618 | 626 | equivs = taxonomic_checker(XML) | ||
4619 | 627 | self.maxDiff = None | ||
4620 | 628 | self.assertDictEqual(equivs, expected) | ||
4621 | 629 | else: | ||
4622 | 630 | print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the taxonomy_checker function" | ||
4623 | 631 | return | ||
4624 | 632 | |||
4625 | 633 | def test_taxonomy_checker2(self): | ||
4626 | 634 | XML = etree.tostring(etree.parse('data/input/check_taxonomy_fixes.phyml',parser),pretty_print=True) | ||
4627 | 635 | if (internet_on()): | ||
4628 | 636 | # This test is a bit dodgy as it depends on EOL's server speed. Run it a few times before deciding it's broken. | ||
4629 | 637 | equivs = taxonomic_checker(XML,verbose=False) | ||
4630 | 638 | self.maxDiff = None | ||
4631 | 639 | self.assert_(equivs['Agathamera_crassa'][0][0] == 'Agathemera_crassa') | ||
4632 | 640 | self.assert_(equivs['Celatoblatta_brunni'][0][0] == 'Maoriblatta_brunni') | ||
4633 | 641 | self.assert_(equivs['Blatta_lateralis'][1] == 'amber') | ||
4634 | 642 | else: | ||
4635 | 643 | print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the taxonomy_checker function" | ||
4636 | 644 | return | ||
4637 | 645 | |||
4638 | 646 | |||
4639 | 561 | def test_load_taxonomy(self): | 647 | def test_load_taxonomy(self): |
4640 | 562 | csv_file = "data/input/create_taxonomy.csv" | 648 | csv_file = "data/input/create_taxonomy.csv" |
4646 | 563 | expected = {'Archaeopteryx lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, | 649 | expected = {'Jeletzkytes_criptonodosus': {'kingdom': 'Metazoa', 'subclass': 'Cephalopoda', 'species': 'Jeletzkytes criptonodosus', 'suborder': 'Ammonoidea', 'provider': 'PBDB', 'subfamily': 'Scaphitidae', 'class': 'Mollusca'}, 'Archaeopteryx_lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, 'Egretta_tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'subkingdom': 'Bilateria', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'subclass': 'Neoloricata', 'species': 'Egretta tricolor', 'phylum': 'Chordata', 'suborder': 'Ischnochitonina', 'superphylum': 'Lophozoa', 'infrakingdom': 'Protostomia', 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus_gallus': {'kingdom': 'Animalia', 'superorder': 'Galliformes', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'class': 'Aves'}, 'Thalassarche_melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'subkingdom': 'Bilateria', 'species': 'Thalassarche melanophris', 'order': 'Procellariiformes', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Thalassarche', 'class': 'Aves'}} |
4642 | 564 | 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'subkingdom': 'Bilateria', 'subclass': 'Neoloricata', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'suborder': 'Ischnochitonina', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Egretta', 'order': 'Pelecaniformes', 'species': 'Egretta tricolor'}, | ||
4643 | 565 | 'Gallus gallus': {'kingdom': 'Animalia', 'infrakingdom': 'Protostomia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'genus': 'Gallus', 'order': 'Galliformes', 'species': 'Gallus gallus'}, | ||
4644 | 566 | 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Thalassarche', 'order': 'Procellariiformes', 'species': 'Thalassarche melanophris'}, | ||
4645 | 567 | 'Jeletzkytes criptonodosus': {'kingdom': 'Metazoa', 'family': 'Scaphitidae', 'order': 'Ammonoidea', 'phylum': 'Mollusca', 'provider': 'PBDB', 'species': 'Jeletzkytes criptonodosus', 'class': 'Cephalopoda'}} | ||
4647 | 568 | taxonomy = load_taxonomy(csv_file) | 650 | taxonomy = load_taxonomy(csv_file) |
4648 | 569 | self.maxDiff = None | 651 | self.maxDiff = None |
4649 | 570 | 652 | ||
4650 | 571 | self.assertDictEqual(taxonomy, expected) | 653 | self.assertDictEqual(taxonomy, expected) |
4651 | 572 | 654 | ||
4652 | 655 | |||
4653 | 656 | def test_load_equivalents(self): | ||
4654 | 657 | csv_file = "data/input/equivalents.csv" | ||
4655 | 658 | expected = {'Turnix_sylvatica': [['Turnix_sylvaticus','Tetrao_sylvaticus','Tetrao_sylvatica','Turnix_sylvatica'],'yellow'], | ||
4656 | 659 | 'Xiphorhynchus_pardalotus':[['Xiphorhynchus_pardalotus'],'green'], | ||
4657 | 660 | 'Phaenicophaeus_curvirostris':[['Zanclostomus_curvirostris','Rhamphococcyx_curvirostris','Phaenicophaeus_curvirostris','Rhamphococcyx_curvirostr'],'yellow'], | ||
4658 | 661 | 'Megalapteryx_benhami':[['Megalapteryx_benhami'],'red'] | ||
4659 | 662 | } | ||
4660 | 663 | equivalents = load_equivalents(csv_file) | ||
4661 | 664 | self.assertDictEqual(equivalents, expected) | ||
4662 | 665 | |||
4663 | 666 | |||
4664 | 573 | def test_name_tree(self): | 667 | def test_name_tree(self): |
4665 | 574 | XML = etree.tostring(etree.parse('data/input/single_source_no_names.phyml',parser),pretty_print=True) | 668 | XML = etree.tostring(etree.parse('data/input/single_source_no_names.phyml',parser),pretty_print=True) |
4666 | 575 | xml_root = _parse_xml(XML) | 669 | xml_root = _parse_xml(XML) |
4667 | @@ -583,6 +677,35 @@ | |||
4668 | 583 | XML = etree.tostring(etree.parse('data/input/single_source.phyml',parser),pretty_print=True) | 677 | XML = etree.tostring(etree.parse('data/input/single_source.phyml',parser),pretty_print=True) |
4669 | 584 | self.assert_(isEqualXML(new_xml,XML)) | 678 | self.assert_(isEqualXML(new_xml,XML)) |
4670 | 585 | 679 | ||
4671 | 680 | def test_all_rename_tree(self): | ||
4672 | 681 | XML = etree.tostring(etree.parse('data/input/single_source_same_tree_name.phyml',parser),pretty_print=True) | ||
4673 | 682 | new_xml = set_all_tree_names(XML,overwrite=True) | ||
4674 | 683 | XML = etree.tostring(etree.parse('data/output/single_source_same_tree_name.phyml',parser),pretty_print=True) | ||
4675 | 684 | self.assert_(isEqualXML(new_xml,XML)) | ||
4676 | 685 | |||
4677 | 686 | def test_get_all_tree_names(self): | ||
4678 | 687 | XML = etree.tostring(etree.parse('data/input/single_source_same_tree_name.phyml',parser),pretty_print=True) | ||
4679 | 688 | names = get_all_tree_names(XML) | ||
4680 | 689 | self.assertListEqual(names,['Hill_2011_2','Hill_2011_2']) | ||
4681 | 690 | |||
4682 | 691 | |||
4683 | 692 | def internet_on(host="8.8.8.8", port=443, timeout=5): | ||
4684 | 693 | import socket | ||
4685 | 694 | |||
4686 | 695 | """ | ||
4687 | 696 | Host: 8.8.8.8 (google-public-dns-a.google.com) | ||
4688 | 697 | OpenPort: 53/tcp | ||
4689 | 698 | Service: domain (DNS/TCP) | ||
4690 | 699 | """ | ||
4691 | 700 | try: | ||
4692 | 701 | socket.setdefaulttimeout(timeout) | ||
4693 | 702 | socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect((host, port)) | ||
4694 | 703 | return True | ||
4695 | 704 | except Exception as ex: | ||
4696 | 705 | print ex.message | ||
4697 | 706 | return False | ||
4698 | 707 | |||
4699 | 708 | |||
4700 | 586 | 709 | ||
4701 | 587 | if __name__ == '__main__': | 710 | if __name__ == '__main__': |
4702 | 588 | unittest.main() | 711 | unittest.main() |
4703 | 589 | 712 | ||
4704 | === modified file 'stk/test/_trees.py' | |||
4705 | --- stk/test/_trees.py 2015-03-26 09:58:58 +0000 | |||
4706 | +++ stk/test/_trees.py 2017-01-12 09:27:31 +0000 | |||
4707 | @@ -5,7 +5,7 @@ | |||
4708 | 5 | sys.path.insert(0,"../../") | 5 | sys.path.insert(0,"../../") |
4709 | 6 | from stk.supertree_toolkit import import_tree, obtain_trees, get_all_taxa, _assemble_tree_matrix, create_matrix, _delete_taxon, _sub_taxon,_tree_contains | 6 | from stk.supertree_toolkit import import_tree, obtain_trees, get_all_taxa, _assemble_tree_matrix, create_matrix, _delete_taxon, _sub_taxon,_tree_contains |
4710 | 7 | from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_taxa_from_tree, get_characters_from_tree, amalgamate_trees, _uniquify | 7 | from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_taxa_from_tree, get_characters_from_tree, amalgamate_trees, _uniquify |
4712 | 8 | from stk.supertree_toolkit import import_trees, import_tree, _trees_equal, _find_trees_for_permuting, permute_tree, get_all_source_names, _getTaxaFromNewick | 8 | from stk.supertree_toolkit import import_trees, import_tree, _trees_equal, _find_trees_for_permuting, permute_tree, get_all_source_names, _getTaxaFromNewick, _parse_tree |
4713 | 9 | from stk.supertree_toolkit import get_mrca | 9 | from stk.supertree_toolkit import get_mrca |
4714 | 10 | import os | 10 | import os |
4715 | 11 | from lxml import etree | 11 | from lxml import etree |
4716 | @@ -215,6 +215,18 @@ | |||
4717 | 215 | mrca = get_mrca(tree,["A","I", "L"]) | 215 | mrca = get_mrca(tree,["A","I", "L"]) |
4718 | 216 | self.assert_(mrca == 8) | 216 | self.assert_(mrca == 8) |
4719 | 217 | 217 | ||
4720 | 218 | def test_get_mrca(self): | ||
4721 | 219 | tree = "(B,(C,(D,(E,((A,F),((I,(G,H)),(J,(K,L))))))));" | ||
4722 | 220 | mrca = get_mrca(tree,["A","F"]) | ||
4723 | 221 | print mrca | ||
4724 | 222 | #self.assert_(mrca == 8) | ||
4725 | 223 | to = _parse_tree('(X,Y,Z,(Q,W));') | ||
4726 | 224 | treeobj = _parse_tree(tree) | ||
4727 | 225 | newnode = treeobj.addNodeBetweenNodes(10,9) | ||
4728 | 226 | treeobj.addSubTree(newnode, to, ignoreRootAssert=True) | ||
4729 | 227 | treeobj.draw() | ||
4730 | 228 | |||
4731 | 229 | |||
4732 | 218 | def test_get_all_trees(self): | 230 | def test_get_all_trees(self): |
4733 | 219 | XML = etree.tostring(etree.parse(single_source_input,parser),pretty_print=True) | 231 | XML = etree.tostring(etree.parse(single_source_input,parser),pretty_print=True) |
4734 | 220 | tree = obtain_trees(XML) | 232 | tree = obtain_trees(XML) |
4735 | 221 | 233 | ||
4736 | === added file 'stk/test/data/input/auto_sub.phyml' | |||
4737 | --- stk/test/data/input/auto_sub.phyml 1970-01-01 00:00:00 +0000 | |||
4738 | +++ stk/test/data/input/auto_sub.phyml 2017-01-12 09:27:31 +0000 | |||
4739 | @@ -0,0 +1,97 @@ | |||
4740 | 1 | <?xml version='1.0' encoding='utf-8'?> | ||
4741 | 2 | <phylo_storage> | ||
4742 | 3 | <project_name> | ||
4743 | 4 | <string_value lines="1">Test</string_value> | ||
4744 | 5 | </project_name> | ||
4745 | 6 | <sources> | ||
4746 | 7 | <source name="Hill_2011"> | ||
4747 | 8 | <bibliographic_information> | ||
4748 | 9 | <article> | ||
4749 | 10 | <authors> | ||
4750 | 11 | <author> | ||
4751 | 12 | <surname> | ||
4752 | 13 | <string_value lines="1">Hill</string_value> | ||
4753 | 14 | </surname> | ||
4754 | 15 | <other_names> | ||
4755 | 16 | <string_value lines="1">Jon</string_value> | ||
4756 | 17 | </other_names> | ||
4757 | 18 | </author> | ||
4758 | 19 | </authors> | ||
4759 | 20 | <title> | ||
4760 | 21 | <string_value lines="1">A great paper</string_value> | ||
4761 | 22 | </title> | ||
4762 | 23 | <year> | ||
4763 | 24 | <integer_value rank="0">2011</integer_value> | ||
4764 | 25 | </year> | ||
4765 | 26 | <journal> | ||
4766 | 27 | <string_value lines="1">Nature</string_value> | ||
4767 | 28 | </journal> | ||
4768 | 29 | <pages> | ||
4769 | 30 | <string_value lines="1">1-12</string_value> | ||
4770 | 31 | </pages> | ||
4771 | 32 | </article> | ||
4772 | 33 | </bibliographic_information> | ||
4773 | 34 | <source_tree name="Hill_2011_1"> | ||
4774 | 35 | <tree> | ||
4775 | 36 | <tree_string> | ||
4776 | 37 | <string_value lines="1">(Thalassarche_melanophris, Pelecaniformes, (Gallus, Gallus_varius));</string_value> | ||
4777 | 38 | </tree_string> | ||
4778 | 39 | <figure_legend> | ||
4779 | 40 | <string_value lines="1">NA</string_value> | ||
4780 | 41 | </figure_legend> | ||
4781 | 42 | <figure_number> | ||
4782 | 43 | <string_value lines="1">1</string_value> | ||
4783 | 44 | </figure_number> | ||
4784 | 45 | <page_number> | ||
4785 | 46 | <string_value lines="1">1</string_value> | ||
4786 | 47 | </page_number> | ||
4787 | 48 | <tree_inference> | ||
4788 | 49 | <optimality_criterion name="Maximum Parsimony"/> | ||
4789 | 50 | </tree_inference> | ||
4790 | 51 | <topology> | ||
4791 | 52 | <outgroup> | ||
4792 | 53 | <string_value lines="1">A</string_value> | ||
4793 | 54 | </outgroup> | ||
4794 | 55 | </topology> | ||
4795 | 56 | </tree> | ||
4796 | 57 | <taxa_data> | ||
4797 | 58 | <all_extant/> | ||
4798 | 59 | </taxa_data> | ||
4799 | 60 | <character_data> | ||
4800 | 61 | <character type="molecular" name="12S"/> | ||
4801 | 62 | </character_data> | ||
4802 | 63 | </source_tree> | ||
4803 | 64 | <source_tree name="Hill_2011_2"> | ||
4804 | 65 | <tree> | ||
4805 | 66 | <tree_string> | ||
4806 | 67 | <string_value lines="1">(Gallus_lafayetii, (Platalea_leucorodia, (Ardea_humbloti, Ardea_goliath)));</string_value> | ||
4807 | 68 | </tree_string> | ||
4808 | 69 | <figure_legend> | ||
4809 | 70 | <string_value lines="1">NA</string_value> | ||
4810 | 71 | </figure_legend> | ||
4811 | 72 | <figure_number> | ||
4812 | 73 | <string_value lines="1">1</string_value> | ||
4813 | 74 | </figure_number> | ||
4814 | 75 | <page_number> | ||
4815 | 76 | <string_value lines="1">1</string_value> | ||
4816 | 77 | </page_number> | ||
4817 | 78 | <tree_inference> | ||
4818 | 79 | <optimality_criterion name="Maximum Parsimony"/> | ||
4819 | 80 | </tree_inference> | ||
4820 | 81 | <topology> | ||
4821 | 82 | <outgroup> | ||
4822 | 83 | <string_value lines="1">A</string_value> | ||
4823 | 84 | </outgroup> | ||
4824 | 85 | </topology> | ||
4825 | 86 | </tree> | ||
4826 | 87 | <taxa_data> | ||
4827 | 88 | <all_extant/> | ||
4828 | 89 | </taxa_data> | ||
4829 | 90 | <character_data> | ||
4830 | 91 | <character type="molecular" name="12S"/> | ||
4831 | 92 | </character_data> | ||
4832 | 93 | </source_tree> | ||
4833 | 94 | </source> | ||
4834 | 95 | </sources> | ||
4835 | 96 | <history/> | ||
4836 | 97 | </phylo_storage> | ||
4837 | 0 | 98 | ||
4838 | === modified file 'stk/test/data/input/check_data_ind.phyml' | |||
4839 | --- stk/test/data/input/check_data_ind.phyml 2014-10-09 09:33:21 +0000 | |||
4840 | +++ stk/test/data/input/check_data_ind.phyml 2017-01-12 09:27:31 +0000 | |||
4841 | @@ -249,6 +249,147 @@ | |||
4842 | 249 | <character type="molecular" name="12S"/> | 249 | <character type="molecular" name="12S"/> |
4843 | 250 | </character_data> | 250 | </character_data> |
4844 | 251 | </source_tree> | 251 | </source_tree> |
4845 | 252 | <source_tree name="Hill_Davis_2011_3"> | ||
4846 | 253 | <tree> | ||
4847 | 254 | <tree_string> | ||
4848 | 255 | <string_value lines="1">((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,H:1.00000)0.00000:0.00000)0.00000:0.00000;</string_value> | ||
4849 | 256 | </tree_string> | ||
4850 | 257 | <figure_legend> | ||
4851 | 258 | <string_value lines="1">NA</string_value> | ||
4852 | 259 | </figure_legend> | ||
4853 | 260 | <figure_number> | ||
4854 | 261 | <string_value lines="1">0</string_value> | ||
4855 | 262 | </figure_number> | ||
4856 | 263 | <page_number> | ||
4857 | 264 | <string_value lines="1">0</string_value> | ||
4858 | 265 | </page_number> | ||
4859 | 266 | <tree_inference> | ||
4860 | 267 | <optimality_criterion name="Maximum Parsimony"/> | ||
4861 | 268 | </tree_inference> | ||
4862 | 269 | <topology> | ||
4863 | 270 | <outgroup> | ||
4864 | 271 | <string_value lines="1">A</string_value> | ||
4865 | 272 | </outgroup> | ||
4866 | 273 | </topology> | ||
4867 | 274 | </tree> | ||
4868 | 275 | <taxa_data> | ||
4869 | 276 | <mixed_fossil_and_extant> | ||
4870 | 277 | <taxon name="A"> | ||
4871 | 278 | <fossil/> | ||
4872 | 279 | </taxon> | ||
4873 | 280 | <taxon name="B"> | ||
4874 | 281 | <fossil/> | ||
4875 | 282 | </taxon> | ||
4876 | 283 | </mixed_fossil_and_extant> | ||
4877 | 284 | </taxa_data> | ||
4878 | 285 | <character_data> | ||
4879 | 286 | <character type="molecular" name="12S"/> | ||
4880 | 287 | </character_data> | ||
4881 | 288 | </source_tree> | ||
4882 | 289 | </source> | ||
4883 | 290 | <source name="Hill_Davis_2013"> | ||
4884 | 291 | <bibliographic_information> | ||
4885 | 292 | <article> | ||
4886 | 293 | <authors> | ||
4887 | 294 | <author> | ||
4888 | 295 | <surname> | ||
4889 | 296 | <string_value lines="1">Hill</string_value> | ||
4890 | 297 | </surname> | ||
4891 | 298 | <other_names> | ||
4892 | 299 | <string_value lines="1">Jon</string_value> | ||
4893 | 300 | </other_names> | ||
4894 | 301 | </author> | ||
4895 | 302 | <author> | ||
4896 | 303 | <surname> | ||
4897 | 304 | <string_value lines="1">Davis</string_value> | ||
4898 | 305 | </surname> | ||
4899 | 306 | <other_names> | ||
4900 | 307 | <string_value lines="1">Katie</string_value> | ||
4901 | 308 | </other_names> | ||
4902 | 309 | </author> | ||
4903 | 310 | </authors> | ||
4904 | 311 | <title> | ||
4905 | 312 | <string_value lines="1">Another superb paper</string_value> | ||
4906 | 313 | </title> | ||
4907 | 314 | <year> | ||
4908 | 315 | <integer_value rank="0">2013</integer_value> | ||
4909 | 316 | </year> | ||
4910 | 317 | </article> | ||
4911 | 318 | </bibliographic_information> | ||
4912 | 319 | <source_tree name="Hill_Davis_2013_1"> | ||
4913 | 320 | <tree> | ||
4914 | 321 | <tree_string> | ||
4915 | 322 | <string_value lines="1">((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,Z:1.00000)0.00000:0.00000)0.00000:0.00000;</string_value> | ||
4916 | 323 | </tree_string> | ||
4917 | 324 | <figure_legend> | ||
4918 | 325 | <string_value lines="1">NA</string_value> | ||
4919 | 326 | </figure_legend> | ||
4920 | 327 | <figure_number> | ||
4921 | 328 | <string_value lines="1">0</string_value> | ||
4922 | 329 | </figure_number> | ||
4923 | 330 | <page_number> | ||
4924 | 331 | <string_value lines="1">0</string_value> | ||
4925 | 332 | </page_number> | ||
4926 | 333 | <tree_inference> | ||
4927 | 334 | <optimality_criterion name="Maximum Parsimony"/> | ||
4928 | 335 | </tree_inference> | ||
4929 | 336 | <topology> | ||
4930 | 337 | <outgroup> | ||
4931 | 338 | <string_value lines="1">A</string_value> | ||
4932 | 339 | </outgroup> | ||
4933 | 340 | </topology> | ||
4934 | 341 | </tree> | ||
4935 | 342 | <taxa_data> | ||
4936 | 343 | <mixed_fossil_and_extant> | ||
4937 | 344 | <taxon name="A"> | ||
4938 | 345 | <fossil/> | ||
4939 | 346 | </taxon> | ||
4940 | 347 | <taxon name="B"> | ||
4941 | 348 | <fossil/> | ||
4942 | 349 | </taxon> | ||
4943 | 350 | </mixed_fossil_and_extant> | ||
4944 | 351 | </taxa_data> | ||
4945 | 352 | <character_data> | ||
4946 | 353 | <character type="molecular" name="12S"/> | ||
4947 | 354 | </character_data> | ||
4948 | 355 | </source_tree> | ||
4949 | 356 | <source_tree name="Hill_Davis_2013_2"> | ||
4950 | 357 | <tree> | ||
4951 | 358 | <tree_string> | ||
4952 | 359 | <string_value lines="1">((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,Z:1.00000)0.00000:0.00000)0.00000:0.00000;</string_value> | ||
4953 | 360 | </tree_string> | ||
4954 | 361 | <figure_legend> | ||
4955 | 362 | <string_value lines="1">NA</string_value> | ||
4956 | 363 | </figure_legend> | ||
4957 | 364 | <figure_number> | ||
4958 | 365 | <string_value lines="1">0</string_value> | ||
4959 | 366 | </figure_number> | ||
4960 | 367 | <page_number> | ||
4961 | 368 | <string_value lines="1">0</string_value> | ||
4962 | 369 | </page_number> | ||
4963 | 370 | <tree_inference> | ||
4964 | 371 | <optimality_criterion name="Maximum Parsimony"/> | ||
4965 | 372 | </tree_inference> | ||
4966 | 373 | <topology> | ||
4967 | 374 | <outgroup> | ||
4968 | 375 | <string_value lines="1">A</string_value> | ||
4969 | 376 | </outgroup> | ||
4970 | 377 | </topology> | ||
4971 | 378 | </tree> | ||
4972 | 379 | <taxa_data> | ||
4973 | 380 | <mixed_fossil_and_extant> | ||
4974 | 381 | <taxon name="A"> | ||
4975 | 382 | <fossil/> | ||
4976 | 383 | </taxon> | ||
4977 | 384 | <taxon name="B"> | ||
4978 | 385 | <fossil/> | ||
4979 | 386 | </taxon> | ||
4980 | 387 | </mixed_fossil_and_extant> | ||
4981 | 388 | </taxa_data> | ||
4982 | 389 | <character_data> | ||
4983 | 390 | <character type="molecular" name="12S"/> | ||
4984 | 391 | </character_data> | ||
4985 | 392 | </source_tree> | ||
4986 | 252 | </source> | 393 | </source> |
4987 | 253 | </sources> | 394 | </sources> |
4988 | 254 | <history/> | 395 | <history/> |
4989 | 255 | 396 | ||
4990 | === added file 'stk/test/data/input/check_taxonomy.phyml' | |||
4991 | --- stk/test/data/input/check_taxonomy.phyml 1970-01-01 00:00:00 +0000 | |||
4992 | +++ stk/test/data/input/check_taxonomy.phyml 2017-01-12 09:27:31 +0000 | |||
4993 | @@ -0,0 +1,67 @@ | |||
4994 | 1 | <?xml version='1.0' encoding='utf-8'?> | ||
4995 | 2 | <phylo_storage> | ||
4996 | 3 | <project_name> | ||
4997 | 4 | <string_value lines="1">Test</string_value> | ||
4998 | 5 | </project_name> | ||
4999 | 6 | <sources> | ||
5000 | 7 | <source name="Hill_2011"> |
The diff has been truncated for viewing.