Merge lp:~jon-hill/supertree-toolkit/sub_in_subfile into lp:supertree-toolkit
- sub_in_subfile
- Merge into stk
Proposed by
Jon Hill
Status: | Merged | ||||
---|---|---|---|---|---|
Merged at revision: | 281 | ||||
Proposed branch: | lp:~jon-hill/supertree-toolkit/sub_in_subfile | ||||
Merge into: | lp:supertree-toolkit | ||||
Diff against target: |
13342 lines (+11326/-781) 44 files modified
debian/control (+1/-1) debian/rules (+1/-0) notes.txt (+38/-0) stk/bzr_version.py (+5/-5) stk/p4/NexusToken.py (+1/-0) stk/p4/NexusToken2.py (+1/-1) stk/p4/Tree.py (+1/-9) stk/p4/Tree_muck.py (+4/-2) stk/scripts/check_nomenclature.py (+0/-224) stk/scripts/check_nomenclature.py.moved (+224/-0) stk/scripts/create_colours_itol.py (+2/-11) stk/scripts/create_taxonomy.py (+4/-100) stk/scripts/fill_in_with_taxonomy.py (+711/-174) stk/scripts/plot_character_taxa_matrix.py (+83/-1) stk/scripts/plot_tree_taxa_matrix.py (+56/-0) stk/scripts/remove_poorly_constrained_taxa.py (+43/-20) stk/scripts/tree_from_taxonomy.py (+142/-0) stk/stk (+787/-34) stk/stk_exceptions.py (+8/-0) stk/supertree_toolkit.py (+849/-47) stk/test/_substitute_taxa.py (+19/-1) stk/test/_supertree_toolkit.py (+138/-15) stk/test/_trees.py (+13/-1) stk/test/data/input/auto_sub.phyml (+97/-0) stk/test/data/input/check_data_ind.phyml (+141/-0) stk/test/data/input/check_taxonomy.phyml (+67/-0) stk/test/data/input/check_taxonomy_fixes.phyml (+378/-0) stk/test/data/input/create_taxonomy.csv (+6/-6) stk/test/data/input/create_taxonomy.phyml (+67/-0) stk/test/data/input/equivalents.csv (+5/-0) stk/test/data/input/mrca.tre (+1/-0) stk/test/data/input/old_stk_test_data_ind.phyml (+1324/-0) stk/test/data/input/old_stk_test_data_tax_overlap.phyml (+627/-0) stk/test/data/input/old_stk_test_nonmonophyl_removed.phyml (+1324/-0) stk/test/data/input/old_stk_test_species_level.phyml (+1324/-0) stk/test/data/input/old_stk_test_taxonomy.csv (+334/-0) stk/test/data/input/old_stk_test_taxonomy_check_subs.dat (+26/-0) stk/test/data/input/old_stk_test_taxonomy_checked.phyml (+1324/-0) stk/test/data/input/old_stk_test_taxonomy_checker.csv (+336/-0) stk/test/data/output/one_click_subs_output.phyml (+97/-0) stk/test/util.py (+7/-0) stk_gui/gui/gui.glade (+670/-124) stk_gui/plugins/phyml/name_author.py (+4/-1) stk_gui/stk_gui/interface.py (+36/-4) |
||||
To merge this branch: | bzr merge lp:~jon-hill/supertree-toolkit/sub_in_subfile | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Jon Hill | Approve | ||
Review via email: mp+314598@code.launchpad.net |
Commit message
Description of the change
Adding taxonomic awareness and fixing a lot of bugs
To post a comment you must log in.
Revision history for this message
Jon Hill (jon-hill) : | # |
review:
Approve
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'debian/control' |
2 | --- debian/control 2016-12-14 16:22:12 +0000 |
3 | +++ debian/control 2017-01-12 09:27:31 +0000 |
4 | @@ -9,7 +9,7 @@ |
5 | |
6 | Package: supertree-toolkit |
7 | Architecture: all |
8 | -Depends: python-tk, python-dxdiff, python-pygraphviz, python-lxml-dbg, python-lxml, python-gtk2, python-numpy, python-matplotlib, python-lxml, libxml2-utils, python, python-gtksourceview2, python-glade2, python-networkx |
9 | +Depends: python-tk, python-simplejson, python-dxdiff, python-pygraphviz, python-lxml-dbg, python-lxml, python-gtk2, python-numpy, python-matplotlib, python-lxml, libxml2-utils, python, python-gtksourceview2, python-glade2, python-networkx, python-argcomplete |
10 | Recommends: python-psyco |
11 | Suggests: |
12 | Conflicts: |
13 | |
14 | === modified file 'debian/rules' |
15 | --- debian/rules 2013-10-14 12:58:59 +0000 |
16 | +++ debian/rules 2017-01-12 09:27:31 +0000 |
17 | @@ -6,5 +6,6 @@ |
18 | |
19 | override_dh_auto_install: |
20 | python setup.py install --root=debian/supertree-toolkit --install-layout=deb --install-scripts=/usr/bin |
21 | + argcomplete.autocomplete(parser) |
22 | |
23 | override_dh_auto_build: |
24 | |
25 | === added file 'notes.txt' |
26 | --- notes.txt 1970-01-01 00:00:00 +0000 |
27 | +++ notes.txt 2017-01-12 09:27:31 +0000 |
28 | @@ -0,0 +1,38 @@ |
29 | +Ideas: |
30 | + |
31 | +Collect data, remove paraphyletic |
32 | + |
33 | +Take taxonomy (from dbs), phyml, users knowledge (encoded as subs file) and information on synonyms (from dbs) |
34 | +to create a master subs file that takes the dat to species level |
35 | + |
36 | +User needs to be able to edit taxonomy - CSV file |
37 | + |
38 | +User needs to choose database source - preferred source. |
39 | + |
40 | + |
41 | +Taxonomic name checker: |
42 | + |
43 | + - use database to get synonyms and possible mispellings |
44 | + - Gui is a 2 column table with green, yellow, red. User filles in red (or removes it), green is fine. Yellow - drop down list with alternatives. |
45 | + - Use this to generate a two column CSV file |
46 | + - On CLI, generate a three column CSV. Original name, new name (or blank for unknown) and a list of possibles. Warn user they *must* fill in the second column or remove the row or the taxa will be deleted. |
47 | + |
48 | +For colloqual names, user adds to column 1 of taxonomy csv and then adds the latin name in the approriate column of the database. The subs can then generate the species list. |
49 | + |
50 | +Use these two csv files to generate a subs file, including replacing higher taxa and genera to create a "to species" substtution (can also output this file for later) |
51 | + |
52 | +Generating data to any taxonomic level can happen later - need to check each species is accounted for in the taxonomy, with correct levels - may need another parse of the taxonomy csv |
53 | + |
54 | + |
55 | +Add data -> paraphyletic taxa -> taxonomy checker -> sub synonyms -> taxonomy generator -> create species level dataset |
56 | + |
57 | +New functions: |
58 | + - taxonomic name checker (this might take a while when online for large dataset) - note that this should be a one for one substitution - seperate function so we can check this? |
59 | + - Pull in taxonomy generator |
60 | + - Add csv file to schema |
61 | + - amaend manual with workflow |
62 | + - warning on multiple subs in data in manual |
63 | + - generate species level subsfile from taxonomy |
64 | + - generate specified taxonomic level data |
65 | + |
66 | + |
67 | |
68 | === modified file 'stk/bzr_version.py' |
69 | --- stk/bzr_version.py 2017-01-11 17:42:56 +0000 |
70 | +++ stk/bzr_version.py 2017-01-12 09:27:31 +0000 |
71 | @@ -4,12 +4,12 @@ |
72 | So don't edit it. :) |
73 | """ |
74 | |
75 | -version_info = {'branch_nick': u'supertree-toolkit', |
76 | - 'build_date': '2017-01-11 17:42:27 +0000', |
77 | +version_info = {'branch_nick': u'sub_in_subfile', |
78 | + 'build_date': '2017-01-11 17:48:33 +0000', |
79 | 'clean': None, |
80 | - 'date': '2017-01-11 17:39:43 +0000', |
81 | - 'revision_id': 'jon.hill@imperial.ac.uk-20170111173943-88so1icr33su3afo', |
82 | - 'revno': '279'} |
83 | + 'date': '2017-01-11 17:48:18 +0000', |
84 | + 'revision_id': 'jon.hill@imperial.ac.uk-20170111174818-9q8a9octvnawruuw', |
85 | + 'revno': '317'} |
86 | |
87 | revisions = {} |
88 | |
89 | |
90 | === modified file 'stk/p4/NexusToken.py' |
91 | --- stk/p4/NexusToken.py 2012-01-11 08:57:43 +0000 |
92 | +++ stk/p4/NexusToken.py 2017-01-12 09:27:31 +0000 |
93 | @@ -44,6 +44,7 @@ |
94 | gm = ["safeNextTok(), called from %s" % caller] |
95 | else: |
96 | gm = ["safeNextTok()"] |
97 | + print flob |
98 | gm.append("Premature Death.") |
99 | gm.append("Ran out of understandable things to read in nexus file.") |
100 | raise Glitch, gm |
101 | |
102 | === modified file 'stk/p4/NexusToken2.py' |
103 | --- stk/p4/NexusToken2.py 2012-01-11 08:57:43 +0000 |
104 | +++ stk/p4/NexusToken2.py 2017-01-12 09:27:31 +0000 |
105 | @@ -88,7 +88,7 @@ |
106 | else: |
107 | gm = ["safeNextTok()"] |
108 | gm.append("Premature Death.") |
109 | - gm.append("Ran out of understandable things to read in nexus file.") |
110 | + gm.append("Ran out of understandable things to read in nexus file." + str(flob)) |
111 | raise Glitch, gm |
112 | else: |
113 | return t |
114 | |
115 | === modified file 'stk/p4/Tree.py' |
116 | --- stk/p4/Tree.py 2013-08-25 09:24:34 +0000 |
117 | +++ stk/p4/Tree.py 2017-01-12 09:27:31 +0000 |
118 | @@ -996,17 +996,9 @@ |
119 | if not item.name: |
120 | if item == self.root: |
121 | if var.fixRootedTrees: |
122 | - if self.name: |
123 | - print "Tree.initFinish() tree '%s'" % self.name |
124 | - else: |
125 | - print 'Tree.initFinish()' |
126 | - print "Fixing tree to work with SuperTree scores" |
127 | + #print "Fixing tree to work with SuperTree scores" |
128 | self.removeRoot() |
129 | elif var.warnAboutTerminalRootWithNoName: |
130 | - if self.name: |
131 | - print "Tree.initFinish() tree '%s'" % self.name |
132 | - else: |
133 | - print 'Tree.initFinish()' |
134 | print ' Non-fatal warning: the root is terminal, but has no name.' |
135 | print ' This may be what you wanted. Or not?' |
136 | print ' (To get rid of this warning, turn off var.warnAboutTerminalRootWithNoName)' |
137 | |
138 | === modified file 'stk/p4/Tree_muck.py' |
139 | --- stk/p4/Tree_muck.py 2015-02-19 14:47:06 +0000 |
140 | +++ stk/p4/Tree_muck.py 2017-01-12 09:27:31 +0000 |
141 | @@ -769,6 +769,7 @@ |
142 | else: |
143 | gm.append("The 2 specified nodes should have a parent-child relationship") |
144 | raise Glitch, gm |
145 | + |
146 | if var.usePfAndNumpy: |
147 | self.deleteCStuff() |
148 | |
149 | @@ -1629,7 +1630,7 @@ |
150 | |
151 | |
152 | |
153 | -def addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None): |
154 | +def addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None, ignoreRootAssert=False): |
155 | """Add a subtree to a tree. |
156 | |
157 | The nodes from theSubTree are added to self.nodes, and theSubTree |
158 | @@ -1666,7 +1667,8 @@ |
159 | |
160 | assert selfNode in self.nodes |
161 | assert selfNode.parent |
162 | - assert theSubTree.root.leftChild and not theSubTree.root.leftChild.sibling # its a root on a stick |
163 | + if not ignoreRootAssert: |
164 | + assert theSubTree.root.leftChild and not theSubTree.root.leftChild.sibling # its a root on a stick |
165 | if not subTreeTaxNames: |
166 | subTreeTaxNames = [n.name for n in theSubTree.iterLeavesNoRoot()] |
167 | |
168 | |
169 | === removed file 'stk/scripts/check_nomenclature.py' |
170 | --- stk/scripts/check_nomenclature.py 2016-07-14 10:12:17 +0000 |
171 | +++ stk/scripts/check_nomenclature.py 1970-01-01 00:00:00 +0000 |
172 | @@ -1,224 +0,0 @@ |
173 | -#!/usr/bin/env python |
174 | -# |
175 | -# Derived from the Supertree Toolkit. Software for managing and manipulating sources |
176 | -# trees ready for supretree construction. |
177 | -# Copyright (C) 2015, Jon Hill, Katie Davis |
178 | -# |
179 | -# This program is free software: you can redistribute it and/or modify |
180 | -# it under the terms of the GNU General Public License as published by |
181 | -# the Free Software Foundation, either version 3 of the License, or |
182 | -# (at your option) any later version. |
183 | -# |
184 | -# This program is distributed in the hope that it will be useful, |
185 | -# but WITHOUT ANY WARRANTY; without even the implied warranty of |
186 | -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
187 | -# GNU General Public License for more details. |
188 | -# |
189 | -# You should have received a copy of the GNU General Public License |
190 | -# along with this program. If not, see <http://www.gnu.org/licenses/>. |
191 | -# |
192 | -# Jon Hill. jon.hill@york.ac.uk. |
193 | -# |
194 | -# |
195 | -# This is an enitrely self-contained script that does not require the STK to be installed. |
196 | - |
197 | -import urllib2 |
198 | -from urllib import quote_plus |
199 | -import simplejson as json |
200 | -import argparse |
201 | -import os |
202 | -import sys |
203 | -import csv |
204 | - |
205 | -def main(): |
206 | - |
207 | - # do stuff |
208 | - parser = argparse.ArgumentParser( |
209 | - prog="Check nomenclature", |
210 | - description="Check nomenclature from a tree file or list against valid names derived from EOL", |
211 | - ) |
212 | - parser.add_argument( |
213 | - '-v', |
214 | - '--verbose', |
215 | - action='store_true', |
216 | - help="Verbose output: mainly progress reports.", |
217 | - default=False |
218 | - ) |
219 | - parser.add_argument( |
220 | - '--existing', |
221 | - help="An existing output file to update further, e.g. with a new set of taxa. Supply the file name." |
222 | - ) |
223 | - parser.add_argument( |
224 | - 'input_file', |
225 | - metavar='input_file', |
226 | - nargs=1, |
227 | - help="Your input taxa list" |
228 | - ) |
229 | - parser.add_argument( |
230 | - 'output_file', |
231 | - metavar='output_file', |
232 | - nargs=1, |
233 | - help="The output file. A CSV-based output, listing name checked, valid name, synonyms and status (red, amber, yellow, green)." |
234 | - ) |
235 | - |
236 | - args = parser.parse_args() |
237 | - verbose = args.verbose |
238 | - input_file = args.input_file[0] |
239 | - output_file = args.output_file[0] |
240 | - existing_data = args.existing |
241 | - |
242 | - if (not existing_data == None): |
243 | - exiting_data = load_equivalents(existing_data) |
244 | - else: |
245 | - existing_data = None |
246 | - |
247 | - with open(input_file,'r') as f: |
248 | - lines = f.read().splitlines() |
249 | - equivs = taxonomic_checker_list(lines, existing_data, verbose=verbose) |
250 | - |
251 | - |
252 | - f = open(output_file,"w") |
253 | - for taxon in sorted(equivs.keys()): |
254 | - f.write(taxon+","+";".join(equivs[taxon][0])+","+equivs[taxon][1]+"\n") |
255 | - f.close() |
256 | - |
257 | - return |
258 | - |
259 | - |
260 | -def taxonomic_checker_list(name_list,existing_data=None,verbose=False): |
261 | - """ For each name in the database generate a database of the original name, |
262 | - possible synonyms and if the taxon is not know, signal that. We do this by |
263 | - using the EoL API to grab synonyms of each taxon. """ |
264 | - |
265 | - |
266 | - if existing_data == None: |
267 | - equivalents = {} |
268 | - else: |
269 | - equivalents = existing_data |
270 | - |
271 | - # for each taxon, check the name on EoL - what if it's a synonym? Does EoL still return a result? |
272 | - # if not, is there another API function to do this? |
273 | - # search for the taxon and grab the name - if you search for a recognised synonym on EoL then |
274 | - # you get the original ('correct') name - shorten this to two words and you're done. |
275 | - for t in name_list: |
276 | - # make sure t has no spaces. |
277 | - t = t.replace(" ","_") |
278 | - if t in equivalents: |
279 | - continue |
280 | - taxon = t.replace("_"," ") |
281 | - if (verbose): |
282 | - print "Looking up ", taxon |
283 | - # get the data from EOL on taxon |
284 | - taxonq = quote_plus(taxon) |
285 | - URL = "http://eol.org/api/search/1.0.json?q="+taxonq |
286 | - req = urllib2.Request(URL) |
287 | - opener = urllib2.build_opener() |
288 | - f = opener.open(req) |
289 | - data = json.load(f) |
290 | - # check if there's some data |
291 | - if len(data['results']) == 0: |
292 | - equivalents[t] = [[t],'red'] |
293 | - continue |
294 | - amber = False |
295 | - if len(data['results']) > 1: |
296 | - # this is not great - we have multiple hits for this taxon - needs the user to go back and warn about this |
297 | - # for automatic processing we'll just take the first one though |
298 | - # colour is amber in this case |
299 | - amber = True |
300 | - ID = str(data['results'][0]['id']) # take first hit |
301 | - URL = "http://eol.org/api/pages/1.0/"+ID+".json?images=2&videos=0&sounds=0&maps=0&text=2&iucn=false&subjects=overview&licenses=all&details=true&common_names=true&synonyms=true&references=true&vetted=0" |
302 | - req = urllib2.Request(URL) |
303 | - opener = urllib2.build_opener() |
304 | - |
305 | - try: |
306 | - f = opener.open(req) |
307 | - except urllib2.HTTPError: |
308 | - equivalents[t] = [[t],'red'] |
309 | - continue |
310 | - data = json.load(f) |
311 | - if len(data['scientificName']) == 0: |
312 | - # not found a scientific name, so set as red |
313 | - equivalents[t] = [[t],'red'] |
314 | - continue |
315 | - correct_name = data['scientificName'].encode("ascii","ignore") |
316 | - # we only want the first two bits of the name, not the original author and year if any |
317 | - temp_name = correct_name.split(' ') |
318 | - if (len(temp_name) > 2): |
319 | - correct_name = ' '.join(temp_name[0:2]) |
320 | - correct_name = correct_name.replace(' ','_') |
321 | - print correct_name, t |
322 | - |
323 | - # build up the output dictionary - original name is key, synonyms/missing is value |
324 | - if (correct_name == t or correct_name == taxon): |
325 | - # if the original matches the 'correct', then it's green |
326 | - equivalents[t] = [[t], 'green'] |
327 | - else: |
328 | - # if we managed to get something anyway, then it's yellow and create a list of possible synonyms with the |
329 | - # 'correct' taxon at the top |
330 | - eol_synonyms = data['synonyms'] |
331 | - synonyms = [] |
332 | - for s in eol_synonyms: |
333 | - ts = s['synonym'].encode("ascii","ignore") |
334 | - temp_syn = ts.split(' ') |
335 | - if (len(temp_syn) > 2): |
336 | - temp_syn = ' '.join(temp_syn[0:2]) |
337 | - ts = temp_syn |
338 | - if (s['relationship'] == "synonym"): |
339 | - ts = ts.replace(" ","_") |
340 | - synonyms.append(ts) |
341 | - synonyms = _uniquify(synonyms) |
342 | - # we need to put the correct name at the top of the list now |
343 | - if (correct_name in synonyms): |
344 | - synonyms.insert(0, synonyms.pop(synonyms.index(correct_name))) |
345 | - elif len(synonyms) == 0: |
346 | - synonyms.append(correct_name) |
347 | - else: |
348 | - synonyms.insert(0,correct_name) |
349 | - |
350 | - if (amber): |
351 | - equivalents[t] = [synonyms,'amber'] |
352 | - else: |
353 | - equivalents[t] = [synonyms,'yellow'] |
354 | - # if our search was empty, then it's red - see above |
355 | - |
356 | - # up to the calling funciton to do something sensible with this |
357 | - # we build a dictionary of names and then a list of synonyms or the original name, then a tag if it's green, yellow, red. |
358 | - # Amber means we found synonyms and multilpe hits. User def needs to sort these! |
359 | - |
360 | - return equivalents |
361 | - |
362 | -def load_equivalents(equiv_csv): |
363 | - """Load equivalents data from a csv and convert to a equivalents Dict. |
364 | - Structure is key, with a list that is array of synonyms, followed by status ('green', |
365 | - 'yellow', 'amber', or 'red'). |
366 | - |
367 | - """ |
368 | - |
369 | - import csv |
370 | - |
371 | - equivalents = {} |
372 | - |
373 | - with open(equiv_csv, 'rU') as csvfile: |
374 | - equiv_reader = csv.reader(csvfile, delimiter=',') |
375 | - equiv_reader.next() # skip header |
376 | - for row in equiv_reader: |
377 | - i = 1 |
378 | - equivalents[row[0]] = [row[1].split(';'),row[2]] |
379 | - |
380 | - return equivalents |
381 | - |
382 | -def _uniquify(l): |
383 | - """ |
384 | - Make a list, l, contain only unique data |
385 | - """ |
386 | - keys = {} |
387 | - for e in l: |
388 | - keys[e] = 1 |
389 | - |
390 | - return keys.keys() |
391 | - |
392 | -if __name__ == "__main__": |
393 | - main() |
394 | - |
395 | - |
396 | - |
397 | |
398 | === added file 'stk/scripts/check_nomenclature.py.moved' |
399 | --- stk/scripts/check_nomenclature.py.moved 1970-01-01 00:00:00 +0000 |
400 | +++ stk/scripts/check_nomenclature.py.moved 2017-01-12 09:27:31 +0000 |
401 | @@ -0,0 +1,224 @@ |
402 | +#!/usr/bin/env python |
403 | +# |
404 | +# Derived from the Supertree Toolkit. Software for managing and manipulating sources |
405 | +# trees ready for supretree construction. |
406 | +# Copyright (C) 2015, Jon Hill, Katie Davis |
407 | +# |
408 | +# This program is free software: you can redistribute it and/or modify |
409 | +# it under the terms of the GNU General Public License as published by |
410 | +# the Free Software Foundation, either version 3 of the License, or |
411 | +# (at your option) any later version. |
412 | +# |
413 | +# This program is distributed in the hope that it will be useful, |
414 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of |
415 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
416 | +# GNU General Public License for more details. |
417 | +# |
418 | +# You should have received a copy of the GNU General Public License |
419 | +# along with this program. If not, see <http://www.gnu.org/licenses/>. |
420 | +# |
421 | +# Jon Hill. jon.hill@york.ac.uk. |
422 | +# |
423 | +# |
424 | +# This is an enitrely self-contained script that does not require the STK to be installed. |
425 | + |
426 | +import urllib2 |
427 | +from urllib import quote_plus |
428 | +import simplejson as json |
429 | +import argparse |
430 | +import os |
431 | +import sys |
432 | +import csv |
433 | + |
434 | +def main(): |
435 | + |
436 | + # do stuff |
437 | + parser = argparse.ArgumentParser( |
438 | + prog="Check nomenclature", |
439 | + description="Check nomenclature from a tree file or list against valid names derived from EOL", |
440 | + ) |
441 | + parser.add_argument( |
442 | + '-v', |
443 | + '--verbose', |
444 | + action='store_true', |
445 | + help="Verbose output: mainly progress reports.", |
446 | + default=False |
447 | + ) |
448 | + parser.add_argument( |
449 | + '--existing', |
450 | + help="An existing output file to update further, e.g. with a new set of taxa. Supply the file name." |
451 | + ) |
452 | + parser.add_argument( |
453 | + 'input_file', |
454 | + metavar='input_file', |
455 | + nargs=1, |
456 | + help="Your input taxa list" |
457 | + ) |
458 | + parser.add_argument( |
459 | + 'output_file', |
460 | + metavar='output_file', |
461 | + nargs=1, |
462 | + help="The output file. A CSV-based output, listing name checked, valid name, synonyms and status (red, amber, yellow, green)." |
463 | + ) |
464 | + |
465 | + args = parser.parse_args() |
466 | + verbose = args.verbose |
467 | + input_file = args.input_file[0] |
468 | + output_file = args.output_file[0] |
469 | + existing_data = args.existing |
470 | + |
471 | + if (not existing_data == None): |
472 | + exiting_data = load_equivalents(existing_data) |
473 | + else: |
474 | + existing_data = None |
475 | + |
476 | + with open(input_file,'r') as f: |
477 | + lines = f.read().splitlines() |
478 | + equivs = taxonomic_checker_list(lines, existing_data, verbose=verbose) |
479 | + |
480 | + |
481 | + f = open(output_file,"w") |
482 | + for taxon in sorted(equivs.keys()): |
483 | + f.write(taxon+","+";".join(equivs[taxon][0])+","+equivs[taxon][1]+"\n") |
484 | + f.close() |
485 | + |
486 | + return |
487 | + |
488 | + |
489 | +def taxonomic_checker_list(name_list,existing_data=None,verbose=False): |
490 | + """ For each name in the database generate a database of the original name, |
491 | + possible synonyms and if the taxon is not know, signal that. We do this by |
492 | + using the EoL API to grab synonyms of each taxon. """ |
493 | + |
494 | + |
495 | + if existing_data == None: |
496 | + equivalents = {} |
497 | + else: |
498 | + equivalents = existing_data |
499 | + |
500 | + # for each taxon, check the name on EoL - what if it's a synonym? Does EoL still return a result? |
501 | + # if not, is there another API function to do this? |
502 | + # search for the taxon and grab the name - if you search for a recognised synonym on EoL then |
503 | + # you get the original ('correct') name - shorten this to two words and you're done. |
504 | + for t in name_list: |
505 | + # make sure t has no spaces. |
506 | + t = t.replace(" ","_") |
507 | + if t in equivalents: |
508 | + continue |
509 | + taxon = t.replace("_"," ") |
510 | + if (verbose): |
511 | + print "Looking up ", taxon |
512 | + # get the data from EOL on taxon |
513 | + taxonq = quote_plus(taxon) |
514 | + URL = "http://eol.org/api/search/1.0.json?q="+taxonq |
515 | + req = urllib2.Request(URL) |
516 | + opener = urllib2.build_opener() |
517 | + f = opener.open(req) |
518 | + data = json.load(f) |
519 | + # check if there's some data |
520 | + if len(data['results']) == 0: |
521 | + equivalents[t] = [[t],'red'] |
522 | + continue |
523 | + amber = False |
524 | + if len(data['results']) > 1: |
525 | + # this is not great - we have multiple hits for this taxon - needs the user to go back and warn about this |
526 | + # for automatic processing we'll just take the first one though |
527 | + # colour is amber in this case |
528 | + amber = True |
529 | + ID = str(data['results'][0]['id']) # take first hit |
530 | + URL = "http://eol.org/api/pages/1.0/"+ID+".json?images=2&videos=0&sounds=0&maps=0&text=2&iucn=false&subjects=overview&licenses=all&details=true&common_names=true&synonyms=true&references=true&vetted=0" |
531 | + req = urllib2.Request(URL) |
532 | + opener = urllib2.build_opener() |
533 | + |
534 | + try: |
535 | + f = opener.open(req) |
536 | + except urllib2.HTTPError: |
537 | + equivalents[t] = [[t],'red'] |
538 | + continue |
539 | + data = json.load(f) |
540 | + if len(data['scientificName']) == 0: |
541 | + # not found a scientific name, so set as red |
542 | + equivalents[t] = [[t],'red'] |
543 | + continue |
544 | + correct_name = data['scientificName'].encode("ascii","ignore") |
545 | + # we only want the first two bits of the name, not the original author and year if any |
546 | + temp_name = correct_name.split(' ') |
547 | + if (len(temp_name) > 2): |
548 | + correct_name = ' '.join(temp_name[0:2]) |
549 | + correct_name = correct_name.replace(' ','_') |
550 | + print correct_name, t |
551 | + |
552 | + # build up the output dictionary - original name is key, synonyms/missing is value |
553 | + if (correct_name == t or correct_name == taxon): |
554 | + # if the original matches the 'correct', then it's green |
555 | + equivalents[t] = [[t], 'green'] |
556 | + else: |
557 | + # if we managed to get something anyway, then it's yellow and create a list of possible synonyms with the |
558 | + # 'correct' taxon at the top |
559 | + eol_synonyms = data['synonyms'] |
560 | + synonyms = [] |
561 | + for s in eol_synonyms: |
562 | + ts = s['synonym'].encode("ascii","ignore") |
563 | + temp_syn = ts.split(' ') |
564 | + if (len(temp_syn) > 2): |
565 | + temp_syn = ' '.join(temp_syn[0:2]) |
566 | + ts = temp_syn |
567 | + if (s['relationship'] == "synonym"): |
568 | + ts = ts.replace(" ","_") |
569 | + synonyms.append(ts) |
570 | + synonyms = _uniquify(synonyms) |
571 | + # we need to put the correct name at the top of the list now |
572 | + if (correct_name in synonyms): |
573 | + synonyms.insert(0, synonyms.pop(synonyms.index(correct_name))) |
574 | + elif len(synonyms) == 0: |
575 | + synonyms.append(correct_name) |
576 | + else: |
577 | + synonyms.insert(0,correct_name) |
578 | + |
579 | + if (amber): |
580 | + equivalents[t] = [synonyms,'amber'] |
581 | + else: |
582 | + equivalents[t] = [synonyms,'yellow'] |
583 | + # if our search was empty, then it's red - see above |
584 | + |
585 | + # up to the calling funciton to do something sensible with this |
586 | + # we build a dictionary of names and then a list of synonyms or the original name, then a tag if it's green, yellow, red. |
587 | + # Amber means we found synonyms and multilpe hits. User def needs to sort these! |
588 | + |
589 | + return equivalents |
590 | + |
591 | +def load_equivalents(equiv_csv): |
592 | + """Load equivalents data from a csv and convert to a equivalents Dict. |
593 | + Structure is key, with a list that is array of synonyms, followed by status ('green', |
594 | + 'yellow', 'amber', or 'red'). |
595 | + |
596 | + """ |
597 | + |
598 | + import csv |
599 | + |
600 | + equivalents = {} |
601 | + |
602 | + with open(equiv_csv, 'rU') as csvfile: |
603 | + equiv_reader = csv.reader(csvfile, delimiter=',') |
604 | + equiv_reader.next() # skip header |
605 | + for row in equiv_reader: |
606 | + i = 1 |
607 | + equivalents[row[0]] = [row[1].split(';'),row[2]] |
608 | + |
609 | + return equivalents |
610 | + |
611 | +def _uniquify(l): |
612 | + """ |
613 | + Make a list, l, contain only unique data |
614 | + """ |
615 | + keys = {} |
616 | + for e in l: |
617 | + keys[e] = 1 |
618 | + |
619 | + return keys.keys() |
620 | + |
621 | +if __name__ == "__main__": |
622 | + main() |
623 | + |
624 | + |
625 | + |
626 | |
627 | === modified file 'stk/scripts/create_colours_itol.py' |
628 | --- stk/scripts/create_colours_itol.py 2014-12-09 10:58:48 +0000 |
629 | +++ stk/scripts/create_colours_itol.py 2017-01-12 09:27:31 +0000 |
630 | @@ -88,17 +88,8 @@ |
631 | saturation=0.25 |
632 | value=0.8 |
633 | |
634 | - index = 3 # family |
635 | - if (level == "Superfamily"): |
636 | - index = 4 |
637 | - elif (level == "Infraorder"): |
638 | - index = 5 |
639 | - elif (level == "Suborder"): |
640 | - index = 6 |
641 | - elif (level == "Order"): |
642 | - index = 7 |
643 | - elif (level == "Genus"): |
644 | - index = 2 |
645 | + index = stk.taxonomy_levels.index(level.lower())+1 |
646 | + print index |
647 | |
648 | if (tree): |
649 | tree_data = stk.import_tree(input_file) |
650 | |
651 | === modified file 'stk/scripts/create_taxonomy.py' |
652 | --- stk/scripts/create_taxonomy.py 2014-03-13 18:45:05 +0000 |
653 | +++ stk/scripts/create_taxonomy.py 2017-01-12 09:27:31 +0000 |
654 | @@ -16,6 +16,8 @@ |
655 | import supertree_toolkit as stk |
656 | import csv |
657 | |
658 | +taxonomy_levels = stk.taxonomy_levels |
659 | + |
660 | def main(): |
661 | |
662 | # do stuff |
663 | @@ -66,13 +68,6 @@ |
664 | f.close() |
665 | |
666 | taxonomy = {} |
667 | - # What we get from EOL |
668 | - current_taxonomy_levels = ['species','genus','family','order','class','phylum','kingdom'] |
669 | - # And the extra ones from ITIS |
670 | - extra_taxonomy_levels = ['superfamily','infraorder','suborder','superorder','subclass','subphylum','superphylum','infrakingdom','subkingdom'] |
671 | - # all of them in order |
672 | - taxonomy_levels = ['species','genus','family','superfamily','infraorder','suborder','order','superorder','subclass','class','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] |
673 | - |
674 | |
675 | for taxon in taxa: |
676 | taxon = taxon.replace("_"," ") |
677 | @@ -180,99 +175,8 @@ |
678 | continue |
679 | |
680 | |
681 | - # Now create the CSV output |
682 | - with open(output_file, 'w') as f: |
683 | - writer = csv.writer(f) |
684 | - writer.writerow(taxonomy_levels) |
685 | - for t in taxonomy: |
686 | - species = t |
687 | - try: |
688 | - genus = taxonomy[t]['genus'] |
689 | - except KeyError: |
690 | - genus = "-" |
691 | - try: |
692 | - family = taxonomy[t]['family'] |
693 | - except KeyError: |
694 | - family = "-" |
695 | - try: |
696 | - superfamily = taxonomy[t]['superfamily'] |
697 | - except KeyError: |
698 | - superfamily = "-" |
699 | - try: |
700 | - infraorder = taxonomy[t]['infraorder'] |
701 | - except KeyError: |
702 | - infraorder = "-" |
703 | - try: |
704 | - suborder = taxonomy[t]['suborder'] |
705 | - except KeyError: |
706 | - suborder = "-" |
707 | - try: |
708 | - order = taxonomy[t]['order'] |
709 | - except KeyError: |
710 | - order = "-" |
711 | - try: |
712 | - superorder = taxonomy[t]['superorder'] |
713 | - except KeyError: |
714 | - superorder = "-" |
715 | - try: |
716 | - subclass = taxonomy[t]['subclass'] |
717 | - except KeyError: |
718 | - subclass = "-" |
719 | - try: |
720 | - tclass = taxonomy[t]['class'] |
721 | - except KeyError: |
722 | - tclass = "-" |
723 | - try: |
724 | - subphylum = taxonomy[t]['subphylum'] |
725 | - except KeyError: |
726 | - subphylum = "-" |
727 | - try: |
728 | - phylum = taxonomy[t]['phylum'] |
729 | - except KeyError: |
730 | - phylum = "-" |
731 | - try: |
732 | - superphylum = taxonomy[t]['superphylum'] |
733 | - except KeyError: |
734 | - superphylum = "-" |
735 | - try: |
736 | - infrakingdom = taxonomy[t]['infrakingdom'] |
737 | - except: |
738 | - infrakingdom = "-" |
739 | - try: |
740 | - subkingdom = taxonomy[t]['subkingdom'] |
741 | - except: |
742 | - subkingdom = "-" |
743 | - try: |
744 | - kingdom = taxonomy[t]['kingdom'] |
745 | - except KeyError: |
746 | - kingdom = "-" |
747 | - try: |
748 | - provider = taxonomy[t]['provider'] |
749 | - except KeyError: |
750 | - provider = "-" |
751 | - |
752 | - |
753 | - this_classification = [ |
754 | - species.encode('utf-8'), |
755 | - genus.encode('utf-8'), |
756 | - family.encode('utf-8'), |
757 | - superfamily.encode('utf-8'), |
758 | - infraorder.encode('utf-8'), |
759 | - suborder.encode('utf-8'), |
760 | - order.encode('utf-8'), |
761 | - superorder.encode('utf-8'), |
762 | - subclass.encode('utf-8'), |
763 | - tclass.encode('utf-8'), |
764 | - subphylum.encode('utf-8'), |
765 | - phylum.encode('utf-8'), |
766 | - superphylum.encode('utf-8'), |
767 | - infrakingdom.encode('utf-8'), |
768 | - subkingdom.encode('utf-8'), |
769 | - kingdom.encode('utf-8'), |
770 | - provider.encode('utf-8')] |
771 | - writer.writerow(this_classification) |
772 | - |
773 | - |
774 | + stk.save_taxonomy(taxonomy, output_file) |
775 | + |
776 | def _uniquify(l): |
777 | """ |
778 | Make a list, l, contain only unique data |
779 | |
780 | === modified file 'stk/scripts/fill_in_with_taxonomy.py' |
781 | --- stk/scripts/fill_in_with_taxonomy.py 2016-12-14 16:22:12 +0000 |
782 | +++ stk/scripts/fill_in_with_taxonomy.py 2017-01-12 09:27:31 +0000 |
783 | @@ -23,21 +23,90 @@ |
784 | from urllib import quote_plus |
785 | import simplejson as json |
786 | import argparse |
787 | +import copy |
788 | import os |
789 | import sys |
790 | stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir ) |
791 | sys.path.insert(0, stk_path) |
792 | import supertree_toolkit as stk |
793 | import csv |
794 | - |
795 | -# What we get from EOL |
796 | -current_taxonomy_levels = ['species','genus','family','order','class','phylum','kingdom'] |
797 | -# And the extra ones from ITIS |
798 | -extra_taxonomy_levels = ['superfamily','infraorder','suborder','superorder','subclass','subphylum','superphylum','infrakingdom','subkingdom'] |
799 | -# all of them in order |
800 | -taxonomy_levels = ['species','genus','subfamily','family','tribe','superfamily','infraorder','suborder','order','superorder','subclass','class','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] |
801 | - |
802 | -def get_tree_taxa_taxonomy(taxon,wsdlObjectWoRMS): |
803 | +from ete2 import Tree |
804 | +import tempfile |
805 | +import re |
806 | + |
807 | +taxonomy_levels = stk.taxonomy_levels |
808 | +#tlevels = ['species','genus','family','superfamily','suborder','order','class','phylum','kingdom'] |
809 | +tlevels = ['species','genus', 'subfamily', 'family','infraorder','order','class','phylum','kingdom'] |
810 | + |
811 | +def get_tree_taxa_taxonomy_eol(taxon): |
812 | + |
813 | + taxonq = quote_plus(taxon) |
814 | + URL = "http://eol.org/api/search/1.0.json?q="+taxonq |
815 | + req = urllib2.Request(URL) |
816 | + opener = urllib2.build_opener() |
817 | + f = opener.open(req) |
818 | + data = json.load(f) |
819 | + |
820 | + if data['results'] == []: |
821 | + return {} |
822 | + ID = str(data['results'][0]['id']) # take first hit |
823 | + # Now look for taxonomies |
824 | + URL = "http://eol.org/api/pages/1.0/"+ID+".json" |
825 | + req = urllib2.Request(URL) |
826 | + opener = urllib2.build_opener() |
827 | + f = opener.open(req) |
828 | + data = json.load(f) |
829 | + if len(data['taxonConcepts']) == 0: |
830 | + return {} |
831 | + TID = str(data['taxonConcepts'][0]['identifier']) # take first hit |
832 | + currentdb = str(data['taxonConcepts'][0]['nameAccordingTo']) |
833 | + # loop through and get preferred one if specified |
834 | + # now get taxonomy |
835 | + for db in data['taxonConcepts']: |
836 | + currentdb = db['nameAccordingTo'].lower() |
837 | + TID = str(db['identifier']) |
838 | + break |
839 | + URL="http://eol.org/api/hierarchy_entries/1.0/"+TID+".json" |
840 | + req = urllib2.Request(URL) |
841 | + opener = urllib2.build_opener() |
842 | + f = opener.open(req) |
843 | + data = json.load(f) |
844 | + tax_array = {} |
845 | + tax_array['provider'] = currentdb |
846 | + for a in data['ancestors']: |
847 | + try: |
848 | + if a.has_key('taxonRank') : |
849 | + temp_level = a['taxonRank'].encode("ascii","ignore") |
850 | + if (temp_level in taxonomy_levels): |
851 | + # note the dump into ASCII |
852 | + temp_name = a['scientificName'].encode("ascii","ignore") |
853 | + temp_name = temp_name.split(" ") |
854 | + if (temp_level == 'species'): |
855 | + tax_array[temp_level] = "_".join(temp_name[0:2]) |
856 | + |
857 | + else: |
858 | + tax_array[temp_level] = temp_name[0] |
859 | + except KeyError as e: |
860 | + logging.exception("Key not found: taxonRank") |
861 | + continue |
862 | + try: |
863 | + # add taxonomy in to the taxonomy! |
864 | + # some issues here, so let's make sure it's OK |
865 | + temp_name = taxon.split(" ") |
866 | + if data.has_key('taxonRank') : |
867 | + if not data['taxonRank'].lower() == 'species': |
868 | + tax_array[data['taxonRank'].lower()] = temp_name[0] |
869 | + else: |
870 | + tax_array[data['taxonRank'].lower()] = ' '.join(temp_name[0:2]) |
871 | + except KeyError as e: |
872 | + return tax_array |
873 | + |
874 | + return tax_array |
875 | + |
876 | +def get_tree_taxa_taxonomy_worms(taxon): |
877 | + |
878 | + from SOAPpy import WSDL |
879 | + wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1') |
880 | |
881 | taxon_data = wsdlObjectWoRMS.getAphiaRecords(taxon.replace('_',' ')) |
882 | if taxon_data == None: |
883 | @@ -51,6 +120,8 @@ |
884 | classification = wsdlObjectWoRMS.getAphiaClassificationByID(taxon_id) |
885 | # construct array |
886 | tax_array = {} |
887 | + if (classification == ""): |
888 | + return {} |
889 | # classification is a nested dictionary, so we need to iterate down it |
890 | current_child = classification.child |
891 | while True: |
892 | @@ -60,27 +131,252 @@ |
893 | break |
894 | return tax_array |
895 | |
896 | - |
897 | - |
898 | -def get_taxonomy_worms(taxonomy, start_otu): |
899 | +def get_tree_taxa_taxonomy_itis(taxon): |
900 | + |
901 | + URL="http://www.itis.gov/ITISWebService/jsonservice/searchByScientificName?srchKey="+quote_plus(taxon.replace('_',' ').strip()) |
902 | + req = urllib2.Request(URL) |
903 | + opener = urllib2.build_opener() |
904 | + f = opener.open(req) |
905 | + string = unicode(f.read(),"ISO-8859-1") |
906 | + this_item = json.loads(string) |
907 | + if this_item['scientificNames'] == [None]: # not found |
908 | + return {} |
909 | + tsn = this_item['scientificNames'][0]['tsn'] # there might be records that aren't valid - they point to the valid one though |
910 | + # so call another function to get any valid names |
911 | + URL="http://www.itis.gov/ITISWebService/jsonservice/getAcceptedNamesFromTSN?tsn="+tsn |
912 | + req = urllib2.Request(URL) |
913 | + opener = urllib2.build_opener() |
914 | + f = opener.open(req) |
915 | + string = unicode(f.read(),"ISO-8859-1") |
916 | + this_item = json.loads(string) |
917 | + if not this_item['acceptedNames'] == [None]: |
918 | + tsn = this_item['acceptedNames'][0]['acceptedTsn'] |
919 | + |
920 | + URL="http://www.itis.gov/ITISWebService/jsonservice/getFullHierarchyFromTSN?tsn="+str(tsn) |
921 | + req = urllib2.Request(URL) |
922 | + opener = urllib2.build_opener() |
923 | + f = opener.open(req) |
924 | + string = unicode(f.read(),"ISO-8859-1") |
925 | + data = json.loads(string) |
926 | + # construct array |
927 | + this_taxonomy = {} |
928 | + for level in data['hierarchyList']: |
929 | + if level['rankName'].lower() in taxonomy_levels: |
930 | + # note the dump into ASCII |
931 | + this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = level['taxonName'].encode("ascii","ignore") |
932 | + |
933 | + return this_taxonomy |
934 | + |
935 | + |
936 | + |
937 | +def get_taxonomy_eol(taxonomy, start_otu, verbose,tmpfile=None,skip=False): |
938 | + |
939 | + # this is the recursive function |
940 | + def get_children(taxonomy, ID, aphiaIDsDone): |
941 | + |
942 | + # get data |
943 | + URL="http://eol.org/api/hierarchy_entries/1.0/"+str(ID)+".json?common_names=false&synonyms=false&cache_ttl=" |
944 | + req = urllib2.Request(URL) |
945 | + opener = urllib2.build_opener() |
946 | + f = opener.open(req) |
947 | + string = unicode(f.read(),"ISO-8859-1") |
948 | + this_item = json.loads(string) |
949 | + if this_item == None: |
950 | + return taxonomy |
951 | + if this_item['taxonRank'].lower().strip() == 'species': |
952 | + # add data to taxonomy dictionary |
953 | + taxon = this_item['scientificName'].split()[0:2] # just the first two words |
954 | + taxon = " ".join(taxon[0:2]) |
955 | + # NOTE following line means existing items are *not* updated |
956 | + if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy |
957 | + this_taxonomy = {} |
958 | + for level in this_item['ancestors']: |
959 | + if level['taxonRank'].lower() in taxonomy_levels: |
960 | + # note the dump into ASCII |
961 | + this_taxonomy[level['taxonRank'].lower().encode("ascii","ignore")] = level['scientificName'].encode("ascii","ignore") |
962 | + # add species: |
963 | + this_taxonomy['species'] = taxon.replace(" ","_") |
964 | + if verbose: |
965 | + print "\tAdding "+taxon |
966 | + taxonomy[taxon] = this_taxonomy |
967 | + if not tmpfile == None: |
968 | + stk.save_taxonomy(taxonomy,tmpfile) |
969 | + return taxonomy |
970 | + else: |
971 | + return taxonomy |
972 | + all_children = [] |
973 | + for level in this_item['children']: |
974 | + if not level == None: |
975 | + all_children.append(level['taxonID']) |
976 | + |
977 | + if (len(all_children) == 0): |
978 | + return taxonomy |
979 | + |
980 | + for child in all_children: |
981 | + if child in aphiaIDsDone: # we get stuck sometime |
982 | + continue |
983 | + aphiaIDsDone.append(child) |
984 | + taxonomy = get_children(taxonomy, child, aphiaIDsDone) |
985 | + return taxonomy |
986 | + |
987 | + |
988 | + # main bit of the get_taxonomy_eol function |
989 | + taxonq = quote_plus(start_otu) |
990 | + URL = "http://eol.org/api/search/1.0.json?q="+taxonq |
991 | + req = urllib2.Request(URL) |
992 | + opener = urllib2.build_opener() |
993 | + f = opener.open(req) |
994 | + data = json.load(f) |
995 | + start_id = str(data['results'][0]['id']) # this is the page ID. We get the species ID next |
996 | + URL = "http://eol.org/api/pages/1.0/"+start_id+".json" |
997 | + req = urllib2.Request(URL) |
998 | + opener = urllib2.build_opener() |
999 | + f = opener.open(req) |
1000 | + data = json.load(f) |
1001 | + if len(data['taxonConcepts']) == 0: |
1002 | + print "Error finding you start taxa. Spelling?" |
1003 | + return None |
1004 | + start_id = data['taxonConcepts'][0]['identifier'] |
1005 | + start_taxonomy_level = data['taxonConcepts'][0]['taxonRank'].lower() |
1006 | + |
1007 | + aphiaIDsDone = [] |
1008 | + if not skip: |
1009 | + taxonomy = get_children(taxonomy,start_id,aphiaIDsDone) |
1010 | + |
1011 | + return taxonomy, start_taxonomy_level |
1012 | + |
1013 | + |
1014 | + |
1015 | +def get_taxonomy_itis(taxonomy, start_otu, verbose,tmpfile=None,skip=False): |
1016 | + import simplejson as json |
1017 | + |
1018 | + # this is the recursive function |
1019 | + def get_children(taxonomy, ID, aphiaIDsDone): |
1020 | + |
1021 | + # get data |
1022 | + URL="http://www.itis.gov/ITISWebService/jsonservice/getFullRecordFromTSN?tsn="+ID |
1023 | + req = urllib2.Request(URL) |
1024 | + opener = urllib2.build_opener() |
1025 | + f = opener.open(req) |
1026 | + string = unicode(f.read(),"ISO-8859-1") |
1027 | + this_item = json.loads(string) |
1028 | + if this_item == None: |
1029 | + return taxonomy |
1030 | + if not this_item['usage']['taxonUsageRating'].lower() == 'valid': |
1031 | + print "rejecting " , this_item['scientificName']['combinedName'] |
1032 | + return taxonomy |
1033 | + if this_item['taxRank']['rankName'].lower().strip() == 'species': |
1034 | + # add data to taxonomy dictionary |
1035 | + taxon = this_item['scientificName']['combinedName'] |
1036 | + # NOTE following line means existing items are *not* updated |
1037 | + if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy |
1038 | + # get the taxonomy of this species |
1039 | + tsn = this_item["scientificName"]["tsn"] |
1040 | + URL="http://www.itis.gov/ITISWebService/jsonservice/getFullHierarchyFromTSN?tsn="+tsn |
1041 | + req = urllib2.Request(URL) |
1042 | + opener = urllib2.build_opener() |
1043 | + f = opener.open(req) |
1044 | + string = unicode(f.read(),"ISO-8859-1") |
1045 | + data = json.loads(string) |
1046 | + this_taxonomy = {} |
1047 | + for level in data['hierarchyList']: |
1048 | + if level['rankName'].lower() in taxonomy_levels: |
1049 | + # note the dump into ASCII |
1050 | + this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = level['taxonName'].encode("ascii","ignore") |
1051 | + if verbose: |
1052 | + print "\tAdding "+taxon |
1053 | + taxonomy[taxon] = this_taxonomy |
1054 | + if not tmpfile == None: |
1055 | + stk.save_taxonomy(taxonomy,tmpfile) |
1056 | + return taxonomy |
1057 | + else: |
1058 | + return taxonomy |
1059 | + |
1060 | + all_children = [] |
1061 | + URL="http://www.itis.gov/ITISWebService/jsonservice/getHierarchyDownFromTSN?tsn="+ID |
1062 | + req = urllib2.Request(URL) |
1063 | + opener = urllib2.build_opener() |
1064 | + f = opener.open(req) |
1065 | + string = unicode(f.read(),"ISO-8859-1") |
1066 | + this_item = json.loads(string) |
1067 | + if this_item == None: |
1068 | + return taxonomy |
1069 | + |
1070 | + for level in this_item['hierarchyList']: |
1071 | + if not level == None: |
1072 | + all_children.append(level['tsn']) |
1073 | + |
1074 | + if (len(all_children) == 0): |
1075 | + return taxonomy |
1076 | + |
1077 | + for child in all_children: |
1078 | + if child in aphiaIDsDone: # we get stuck sometime |
1079 | + continue |
1080 | + aphiaIDsDone.append(child) |
1081 | + taxonomy = get_children(taxonomy, child, aphiaIDsDone) |
1082 | + |
1083 | + return taxonomy |
1084 | + |
1085 | + |
1086 | + # main bit of the get_taxonomy_worms function |
1087 | + URL="http://www.itis.gov/ITISWebService/jsonservice/searchByScientificName?srchKey="+quote_plus(start_otu.strip()) |
1088 | + req = urllib2.Request(URL) |
1089 | + opener = urllib2.build_opener() |
1090 | + f = opener.open(req) |
1091 | + string = unicode(f.read(),"ISO-8859-1") |
1092 | + this_item = json.loads(string) |
1093 | + start_id = this_item['scientificNames'][0]['tsn'] # there might be records that aren't valid - they point to the valid one though |
1094 | + # call it again via the ID this time to make sure we've got the right one. |
1095 | + # so call another function to get any valid names |
1096 | + URL="http://www.itis.gov/ITISWebService/jsonservice/getAcceptedNamesFromTSN?tsn="+start_id |
1097 | + req = urllib2.Request(URL) |
1098 | + opener = urllib2.build_opener() |
1099 | + f = opener.open(req) |
1100 | + string = unicode(f.read(),"ISO-8859-1") |
1101 | + this_item = json.loads(string) |
1102 | + if not this_item['acceptedNames'] == [None]: |
1103 | + start_id = this_item['acceptedNames'][0]['acceptedTsn'] |
1104 | + |
1105 | + URL="http://www.itis.gov/ITISWebService/jsonservice/getFullRecordFromTSN?tsn="+start_id |
1106 | + req = urllib2.Request(URL) |
1107 | + opener = urllib2.build_opener() |
1108 | + f = opener.open(req) |
1109 | + string = unicode(f.read(),"ISO-8859-1") |
1110 | + this_item = json.loads(string) |
1111 | + start_taxonomy_level = this_item['taxRank']['rankName'].lower() |
1112 | + |
1113 | + aphiaIDsDone = [] |
1114 | + if not skip: |
1115 | + taxonomy = get_children(taxonomy,start_id,aphiaIDsDone) |
1116 | + |
1117 | + return taxonomy, start_taxonomy_level |
1118 | + |
1119 | + |
1120 | + |
1121 | + |
1122 | +def get_taxonomy_worms(taxonomy, start_otu, verbose,tmpfile=None,skip=False): |
1123 | """ Gets and processes a taxon from the queue to get its taxonomy.""" |
1124 | from SOAPpy import WSDL |
1125 | |
1126 | wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1') |
1127 | |
1128 | # this is the recursive function |
1129 | - def get_children(taxonomy, ID): |
1130 | + def get_children(taxonomy, ID, aphiaIDsDone): |
1131 | |
1132 | # get data |
1133 | this_item = wsdlObjectWoRMS.getAphiaRecordByID(ID) |
1134 | if this_item == None: |
1135 | return taxonomy |
1136 | + if not this_item['status'].lower() == 'accepted': |
1137 | + print "rejecting " , this_item.valid_name |
1138 | + return taxonomy |
1139 | if this_item['rank'].lower() == 'species': |
1140 | # add data to taxonomy dictionary |
1141 | - # get the taxonomy of this species |
1142 | - classification = wsdlObjectWoRMS.getAphiaClassificationByID(ID) |
1143 | - taxon = this_item.scientificname |
1144 | + taxon = this_item.valid_name |
1145 | + # NOTE following line means existing items are *not* updated |
1146 | if not taxon in taxonomy: # is a new taxon, not previously in the taxonomy |
1147 | + # get the taxonomy of this species |
1148 | + classification = wsdlObjectWoRMS.getAphiaClassificationByID(ID) |
1149 | # construct array |
1150 | tax_array = {} |
1151 | # classification is a nested dictionary, so we need to iterate down it |
1152 | @@ -92,16 +388,36 @@ |
1153 | current_child = current_child.child |
1154 | if current_child == '': # empty one is a string for some reason |
1155 | break |
1156 | - taxonomy[this_item.scientificname] = tax_array |
1157 | + if verbose: |
1158 | + print "\tAdding "+this_item.scientificname |
1159 | + taxonomy[this_item.valid_name] = tax_array |
1160 | + if not tmpfile == None: |
1161 | + stk.save_taxonomy(taxonomy,tmpfile) |
1162 | return taxonomy |
1163 | else: |
1164 | return taxonomy |
1165 | |
1166 | - children = wsdlObjectWoRMS.getAphiaChildrenByID(ID, 1, False) |
1167 | - |
1168 | - for child in children: |
1169 | - taxonomy = get_children(taxonomy, child['valid_AphiaID']) |
1170 | - |
1171 | + all_children = [] |
1172 | + start = 1 |
1173 | + while True: |
1174 | + children = wsdlObjectWoRMS.getAphiaChildrenByID(ID, start, False) |
1175 | + if (children is None or children == None): |
1176 | + break |
1177 | + if (len(children) < 50): |
1178 | + all_children.extend(children) |
1179 | + break |
1180 | + all_children.extend(children) |
1181 | + start += 50 |
1182 | + |
1183 | + if (len(all_children) == 0): |
1184 | + return taxonomy |
1185 | + |
1186 | + for child in all_children: |
1187 | + if child['valid_AphiaID'] in aphiaIDsDone: # we get stuck sometime |
1188 | + continue |
1189 | + aphiaIDsDone.append(child['valid_AphiaID']) |
1190 | + taxonomy = get_children(taxonomy, child['valid_AphiaID'], aphiaIDsDone) |
1191 | + |
1192 | return taxonomy |
1193 | |
1194 | |
1195 | @@ -111,12 +427,17 @@ |
1196 | start_id = start_taxa[0]['valid_AphiaID'] # there might be records that aren't valid - they point to the valid one though |
1197 | # call it again via the ID this time to make sure we've got the right one. |
1198 | start_taxa = wsdlObjectWoRMS.getAphiaRecordByID(start_id) |
1199 | - start_taxonomy_level = start_taxa['rank'].lower() |
1200 | - except HTTPError: |
1201 | - print "Error" |
1202 | + if start_taxa == None: |
1203 | + start_taxonomy_level = 'infraorder' |
1204 | + else: |
1205 | + start_taxonomy_level = start_taxa['rank'].lower() |
1206 | + except urllib2.HTTPError: |
1207 | + print "Error finding start_otu taxonomic level. Do you have an internet connection?" |
1208 | sys.exit(-1) |
1209 | |
1210 | - taxonomy = get_children(taxonomy,start_id) |
1211 | + aphiaIDsDone = [] |
1212 | + if not skip: |
1213 | + taxonomy = get_children(taxonomy,start_id,aphiaIDsDone) |
1214 | |
1215 | return taxonomy, start_taxonomy_level |
1216 | |
1217 | @@ -136,9 +457,16 @@ |
1218 | default=False |
1219 | ) |
1220 | parser.add_argument( |
1221 | + '-s', |
1222 | + '--skip', |
1223 | + action='store_true', |
1224 | + help="Skip online checking, just use taxonomy files", |
1225 | + default=False |
1226 | + ) |
1227 | + parser.add_argument( |
1228 | '--pref_db', |
1229 | help="Taxonomy database to use. Default is Species 2000/ITIS", |
1230 | - choices=['itis', 'worms', 'ncbi'], |
1231 | + choices=['itis', 'worms', 'ncbi', 'eol'], |
1232 | default = 'worms' |
1233 | ) |
1234 | parser.add_argument( |
1235 | @@ -178,58 +506,250 @@ |
1236 | top_level = args.top_level[0] |
1237 | save_taxonomy_file = args.save_taxonomy |
1238 | tree_taxonomy = args.tree_taxonomy |
1239 | + taxonomy = args.taxonomy_from_file |
1240 | pref_db = args.pref_db |
1241 | + skip = args.skip |
1242 | if (save_taxonomy_file == None): |
1243 | save_taxonomy = False |
1244 | else: |
1245 | save_taxonomy = True |
1246 | + load_tree_taxonomy = False |
1247 | + if (not tree_taxonomy == None): |
1248 | + tree_taxonomy_file = tree_taxonomy |
1249 | + load_tree_taxonomy = True |
1250 | + if skip: |
1251 | + if taxonomy == None: |
1252 | + print "Error: If you're skipping checking online, then you need to supply taxonomy files" |
1253 | + return |
1254 | |
1255 | # grab taxa in tree |
1256 | tree = stk.import_tree(input_file) |
1257 | taxa_list = stk._getTaxaFromNewick(tree) |
1258 | - |
1259 | - taxonomy = {} |
1260 | - |
1261 | - # we're going to add the taxa in the tree to the taxonomy, to stop them |
1262 | + |
1263 | + if verbose: |
1264 | + print "Taxa count for input tree: ", len(taxa_list) |
1265 | + |
1266 | + # load in any taxonomy files - we still call the APIs as a) they may have updated data and |
1267 | + # b) the user may have missed some first time round (i.e. expanded the tree and not redone |
1268 | + # the taxonomy |
1269 | + if (taxonomy == None): |
1270 | + taxonomy = {} |
1271 | + else: |
1272 | + taxonomy = stk.load_taxonomy(taxonomy) |
1273 | + tree_taxonomy = {} |
1274 | + # this might also have tree_taxonomy in too - let's check this |
1275 | + for t in taxa_list: |
1276 | + if t in taxonomy: |
1277 | + tree_taxonomy[t] = taxonomy[t] |
1278 | + elif t.replace("_"," ") in taxonomy: |
1279 | + tree_taxonomy[t] = taxonomy[t.replace("_"," ")] |
1280 | + |
1281 | + if (load_tree_taxonomy): # overwrite the good work above... |
1282 | + tree_taxonomy = stk.load_taxonomy(tree_taxonomy_file) |
1283 | + if (tree_taxonomy == None): |
1284 | + tree_taxonomy = {} |
1285 | + |
1286 | + # we're going to add the taxa in the tree to the main WORMS taxonomy, to stop them |
1287 | # being fetched in first place. We delete them later |
1288 | + # If you've loaded a taxonomy created by this script, this overwrites the tree taxa in the main taxonomy dict |
1289 | + # Don't worry, we put them back in before saving again! |
1290 | for taxon in taxa_list: |
1291 | taxon = taxon.replace('_',' ') |
1292 | - taxonomy[taxon] = [] |
1293 | - |
1294 | + taxonomy[taxon] = {} |
1295 | |
1296 | if (pref_db == 'itis'): |
1297 | # get taxonomy info from itis |
1298 | - print "Sorry, ITIS is not implemented yet" |
1299 | - pass |
1300 | + if (verbose): |
1301 | + print "Getting data from ITIS" |
1302 | + if (verbose): |
1303 | + print "Dealing with taxa in tree" |
1304 | + for t in taxa_list: |
1305 | + if verbose: |
1306 | + print "\t"+t |
1307 | + if not(t in tree_taxonomy or t.replace("_"," ") in tree_taxonomy): |
1308 | + # we don't have data - NOTE we assume things are *not* updated here if we do |
1309 | + tree_taxonomy[t] = get_tree_taxa_taxonomy_itis(t) |
1310 | + |
1311 | + if save_taxonomy: |
1312 | + if (verbose): |
1313 | + print "Saving tree taxonomy" |
1314 | + # note -temporary save as we overwrite this file later. |
1315 | + stk.save_taxonomy(tree_taxonomy,save_taxonomy_file+'_tree.csv') |
1316 | + |
1317 | + # get taxonomy from worms |
1318 | + if verbose: |
1319 | + print "Now dealing with all other taxa - this might take a while..." |
1320 | + # create a temp file so we can checkpoint and continue |
1321 | + tmpf, tmpfile = tempfile.mkstemp() |
1322 | + |
1323 | + if os.path.isfile('.fit_lock'): |
1324 | + f = open('.fit_lock','r') |
1325 | + tf = f.read() |
1326 | + f.close() |
1327 | + if os.path.isfile(tf.strip()): |
1328 | + taxonomy = stk.load_taxonomy(tf.strip()) |
1329 | + os.remove('.fit_lock') |
1330 | + |
1331 | + # create lock file - if this is here, then we load from the file in the lock file (or try to) and continue |
1332 | + # where we left off. |
1333 | + with open(".fit_lock", 'w') as f: |
1334 | + f.write(tmpfile) |
1335 | + # bit naughty with tmpfile - we're using the filename rather than handle to write to it. Have to for write_taxonomy function |
1336 | + taxonomy, start_level = get_taxonomy_itis(taxonomy,top_level,verbose,tmpfile=tmpfile,skip=skip) # this skips ones already there |
1337 | + |
1338 | + # clean up |
1339 | + os.close(tmpf) |
1340 | + os.remove('.fit_lock') |
1341 | + try: |
1342 | + os.remove('tmpfile') |
1343 | + except OSError: |
1344 | + pass |
1345 | elif (pref_db == 'worms'): |
1346 | + if (verbose): |
1347 | + print "Getting data from WoRMS" |
1348 | # get tree taxonomy from worms |
1349 | - if (tree_taxonomy == None): |
1350 | - tree_taxonomy = {} |
1351 | - for t in taxa_list: |
1352 | - from SOAPpy import WSDL |
1353 | - wsdlObjectWoRMS = WSDL.Proxy('http://www.marinespecies.org/aphia.php?p=soap&wsdl=1') |
1354 | - tree_taxonomy[t] = get_tree_taxa_taxonomy(t,wsdlObjectWoRMS) |
1355 | - else: |
1356 | - tree_taxonomy = stk.load_taxonomy(tree_taxonomy) |
1357 | + if (verbose): |
1358 | + print "Dealing with taxa in tree" |
1359 | + |
1360 | + for t in taxa_list: |
1361 | + if verbose: |
1362 | + print "\t"+t |
1363 | + if not(t in tree_taxonomy or t.replace("_"," ") in tree_taxonomy): |
1364 | + # we don't have data - NOTE we assume things are *not* updated here if we do |
1365 | + tree_taxonomy[t] = get_tree_taxa_taxonomy_worms(t) |
1366 | + |
1367 | + if save_taxonomy: |
1368 | + if (verbose): |
1369 | + print "Saving tree taxonomy" |
1370 | + # note -temporary save as we overwrite this file later. |
1371 | + stk.save_taxonomy(tree_taxonomy,save_taxonomy_file+'_tree.csv') |
1372 | + |
1373 | # get taxonomy from worms |
1374 | - taxonomy, start_level = get_taxonomy_worms(taxonomy,top_level) |
1375 | + if verbose: |
1376 | + print "Now dealing with all other taxa - this might take a while..." |
1377 | + # create a temp file so we can checkpoint and continue |
1378 | + tmpf, tmpfile = tempfile.mkstemp() |
1379 | + |
1380 | + if os.path.isfile('.fit_lock'): |
1381 | + f = open('.fit_lock','r') |
1382 | + tf = f.read() |
1383 | + f.close() |
1384 | + if os.path.isfile(tf.strip()): |
1385 | + taxonomy = stk.load_taxonomy(tf.strip()) |
1386 | + os.remove('.fit_lock') |
1387 | + |
1388 | + # create lock file - if this is here, then we load from the file in the lock file (or try to) and continue |
1389 | + # where we left off. |
1390 | + with open(".fit_lock", 'w') as f: |
1391 | + f.write(tmpfile) |
1392 | + # bit naughty with tmpfile - we're using the filename rather than handle to write to it. Have to for write_taxonomy function |
1393 | + taxonomy, start_level = get_taxonomy_worms(taxonomy,top_level,verbose,tmpfile=tmpfile,skip=skip) # this skips ones already there |
1394 | + |
1395 | + # clean up |
1396 | + os.close(tmpf) |
1397 | + os.remove('.fit_lock') |
1398 | + try: |
1399 | + os.remove('tmpfile') |
1400 | + except OSError: |
1401 | + pass |
1402 | |
1403 | elif (pref_db == 'ncbi'): |
1404 | # get taxonomy from ncbi |
1405 | print "Sorry, NCBI is not implemented yet" |
1406 | pass |
1407 | + elif (pref_db == 'eol'): |
1408 | + if (verbose): |
1409 | + print "Getting data from EOL" |
1410 | + # get tree taxonomy from worms |
1411 | + if (verbose): |
1412 | + print "Dealing with taxa in tree" |
1413 | + for t in taxa_list: |
1414 | + if verbose: |
1415 | + print "\t"+t |
1416 | + try: |
1417 | + tree_taxonomy[t] |
1418 | + pass # we have data - NOTE we assume things are *not* updated here... |
1419 | + except KeyError: |
1420 | + try: |
1421 | + tree_taxonomy[t.replace('_',' ')] |
1422 | + except KeyError: |
1423 | + tree_taxonomy[t] = get_tree_taxa_taxonomy_eol(t) |
1424 | + |
1425 | + if save_taxonomy: |
1426 | + if (verbose): |
1427 | + print "Saving tree taxonomy" |
1428 | + # note -temporary save as we overwrite this file later. |
1429 | + stk.save_taxonomy(tree_taxonomy,save_taxonomy_file+'_tree.csv') |
1430 | + |
1431 | + # get taxonomy from worms |
1432 | + if verbose: |
1433 | + print "Now dealing with all other taxa - this might take a while..." |
1434 | + # create a temp file so we can checkpoint and continue |
1435 | + tmpf, tmpfile = tempfile.mkstemp() |
1436 | + |
1437 | + if os.path.isfile('.fit_lock'): |
1438 | + f = open('.fit_lock','r') |
1439 | + tf = f.read() |
1440 | + f.close() |
1441 | + if os.path.isfile(tf.strip()): |
1442 | + taxonomy = stk.load_taxonomy(tf.strip()) |
1443 | + os.remove('.fit_lock') |
1444 | + |
1445 | + # create lock file - if this is here, then we load from the file in the lock file (or try to) and continue |
1446 | + # where we left off. |
1447 | + with open(".fit_lock", 'w') as f: |
1448 | + f.write(tmpfile) |
1449 | + # bit naughty with tmpfile - we're using the filename rather than handle to write to it. Have to for write_taxonomy function |
1450 | + taxonomy, start_level = get_taxonomy_eol(taxonomy,top_level,verbose,tmpfile=tmpfile,skip=skip) # this skips ones already there |
1451 | + |
1452 | + # clean up |
1453 | + os.close(tmpf) |
1454 | + os.remove('.fit_lock') |
1455 | + try: |
1456 | + os.remove('tmpfile') |
1457 | + except OSError: |
1458 | + pass |
1459 | else: |
1460 | - print "ERROR: Didn't understand you database choice" |
1461 | + print "ERROR: Didn't understand your database choice" |
1462 | sys.exit(-1) |
1463 | |
1464 | # clean up taxonomy, deleting the ones already in the tree |
1465 | for taxon in taxa_list: |
1466 | - taxon = taxon.replace('_',' ') |
1467 | - del taxonomy[taxon] |
1468 | + taxon = taxon.replace('_',' ') |
1469 | + try: |
1470 | + del taxonomy[taxon] |
1471 | + except KeyError: |
1472 | + pass # if it's not there, so we care? |
1473 | + |
1474 | + # We now have 2 taxonomies: |
1475 | + # - for taxa in the tree |
1476 | + # - for all other taxa in the clade of interest |
1477 | + |
1478 | + if save_taxonomy: |
1479 | + tot_taxonomy = taxonomy.copy() |
1480 | + tot_taxonomy.update(tree_taxonomy) |
1481 | + stk.save_taxonomy(tot_taxonomy,save_taxonomy_file) |
1482 | + |
1483 | + |
1484 | + orig_taxa_list = taxa_list |
1485 | + |
1486 | + remove_higher_level = [] # for storing the higher level taxa in the original tree that need deleting |
1487 | + generic = [] |
1488 | + # find all the generic and build an internal subs file |
1489 | + for t in taxa_list: |
1490 | + t = t.replace(" ","_") |
1491 | + if t.find("_") == -1: |
1492 | + # no underscore, so just generic |
1493 | + generic.append(t) |
1494 | |
1495 | # step up the taxonomy levels from genus, adding taxa to the correct node |
1496 | # as a polytomy |
1497 | - for level in taxonomy_levels[1::]: # skip species.... |
1498 | + start_level = start_level.encode('utf-8').strip() |
1499 | + if verbose: |
1500 | + print "I think your start OTU is at: ", start_level |
1501 | + for level in tlevels[1::]: # skip species.... |
1502 | + if verbose: |
1503 | + print "Dealing with ",level |
1504 | new_taxa = [] |
1505 | for t in taxonomy: |
1506 | # skip odd ones that should be in there |
1507 | @@ -239,135 +759,61 @@ |
1508 | except KeyError: |
1509 | continue # don't have this info |
1510 | new_taxa = _uniquify(new_taxa) |
1511 | + |
1512 | for nt in new_taxa: |
1513 | - taxa_to_add = [] |
1514 | + taxa_to_add = {} |
1515 | taxa_in_clade = [] |
1516 | for t in taxonomy: |
1517 | if start_level in taxonomy[t] and taxonomy[t][start_level] == top_level: |
1518 | try: |
1519 | - if taxonomy[t][level] == nt: |
1520 | - taxa_to_add.append(t.replace(' ','_')) |
1521 | + if taxonomy[t][level] == nt and not t in taxa_list: |
1522 | + taxa_to_add[t] = taxonomy[t] |
1523 | except KeyError: |
1524 | continue |
1525 | + |
1526 | # add to tree |
1527 | for t in taxa_list: |
1528 | if level in tree_taxonomy[t] and tree_taxonomy[t][level] == nt: |
1529 | taxa_in_clade.append(t) |
1530 | - if len(taxa_in_clade) > 0: |
1531 | - tree = add_taxa(tree, taxa_to_add, taxa_in_clade) |
1532 | - for t in taxa_to_add: # clean up taxonomy |
1533 | - del taxonomy[t.replace('_',' ')] |
1534 | - |
1535 | - |
1536 | + if t in generic: |
1537 | + # we are appending taxa to this higher taxon, so we need to remove it |
1538 | + remove_higher_level.append(t) |
1539 | + |
1540 | + |
1541 | + if len(taxa_in_clade) > 0 and len(taxa_to_add) > 0: |
1542 | + tree = add_taxa(tree, taxa_to_add, taxa_in_clade,level) |
1543 | + try: |
1544 | + taxa_list = stk._getTaxaFromNewick(tree) |
1545 | + except stk.TreeParseError as e: |
1546 | + print taxa_to_add, taxa_in_clade, level, tree |
1547 | + print e.msg |
1548 | + return |
1549 | + |
1550 | + for t in taxa_to_add: |
1551 | + tree_taxonomy[t.replace(' ','_')] = taxa_to_add[t] |
1552 | + try: |
1553 | + del taxonomy[t.replace('_',' ')] |
1554 | + except KeyError: |
1555 | + # It might have _ or it might not... |
1556 | + del taxonomy[t] |
1557 | + |
1558 | + |
1559 | + # remove singelton nodes |
1560 | + tree = stk._collapse_nodes(tree) |
1561 | + tree = stk._collapse_nodes(tree) |
1562 | + tree = stk._collapse_nodes(tree) |
1563 | + |
1564 | + tree = stk._sub_taxa_in_tree(tree, remove_higher_level) |
1565 | trees = {} |
1566 | trees['tree_1'] = tree |
1567 | output = stk._amalgamate_trees(trees,format='nexus') |
1568 | f = open(output_file, "w") |
1569 | f.write(output) |
1570 | f.close() |
1571 | - |
1572 | - if not save_taxonomy_file == None: |
1573 | - with open(save_taxonomy_file, 'w') as f: |
1574 | - writer = csv.writer(f) |
1575 | - headers = [] |
1576 | - headers.append("OTU") |
1577 | - headers.extend(taxonomy_levels) |
1578 | - headers.append("Data source") |
1579 | - writer.writerow(headers) |
1580 | - for t in taxonomy: |
1581 | - otu = t |
1582 | - try: |
1583 | - species = taxonomy[t]['species'] |
1584 | - except KeyError: |
1585 | - species = "-" |
1586 | - try: |
1587 | - genus = taxonomy[t]['genus'] |
1588 | - except KeyError: |
1589 | - genus = "-" |
1590 | - try: |
1591 | - family = taxonomy[t]['family'] |
1592 | - except KeyError: |
1593 | - family = "-" |
1594 | - try: |
1595 | - superfamily = taxonomy[t]['superfamily'] |
1596 | - except KeyError: |
1597 | - superfamily = "-" |
1598 | - try: |
1599 | - infraorder = taxonomy[t]['infraorder'] |
1600 | - except KeyError: |
1601 | - infraorder = "-" |
1602 | - try: |
1603 | - suborder = taxonomy[t]['suborder'] |
1604 | - except KeyError: |
1605 | - suborder = "-" |
1606 | - try: |
1607 | - order = taxonomy[t]['order'] |
1608 | - except KeyError: |
1609 | - order = "-" |
1610 | - try: |
1611 | - superorder = taxonomy[t]['superorder'] |
1612 | - except KeyError: |
1613 | - superorder = "-" |
1614 | - try: |
1615 | - subclass = taxonomy[t]['subclass'] |
1616 | - except KeyError: |
1617 | - subclass = "-" |
1618 | - try: |
1619 | - tclass = taxonomy[t]['class'] |
1620 | - except KeyError: |
1621 | - tclass = "-" |
1622 | - try: |
1623 | - subphylum = taxonomy[t]['subphylum'] |
1624 | - except KeyError: |
1625 | - subphylum = "-" |
1626 | - try: |
1627 | - phylum = taxonomy[t]['phylum'] |
1628 | - except KeyError: |
1629 | - phylum = "-" |
1630 | - try: |
1631 | - superphylum = taxonomy[t]['superphylum'] |
1632 | - except KeyError: |
1633 | - superphylum = "-" |
1634 | - try: |
1635 | - infrakingdom = taxonomy[t]['infrakingdom'] |
1636 | - except: |
1637 | - infrakingdom = "-" |
1638 | - try: |
1639 | - subkingdom = taxonomy[t]['subkingdom'] |
1640 | - except: |
1641 | - subkingdom = "-" |
1642 | - try: |
1643 | - kingdom = taxonomy[t]['kingdom'] |
1644 | - except KeyError: |
1645 | - kingdom = "-" |
1646 | - try: |
1647 | - provider = taxonomy[t]['provider'] |
1648 | - except KeyError: |
1649 | - provider = "-" |
1650 | - |
1651 | - if (isinstance(species, list)): |
1652 | - species = " ".join(species) |
1653 | - this_classification = [ |
1654 | - otu.encode('utf-8'), |
1655 | - species.encode('utf-8'), |
1656 | - genus.encode('utf-8'), |
1657 | - family.encode('utf-8'), |
1658 | - superfamily.encode('utf-8'), |
1659 | - infraorder.encode('utf-8'), |
1660 | - suborder.encode('utf-8'), |
1661 | - order.encode('utf-8'), |
1662 | - superorder.encode('utf-8'), |
1663 | - subclass.encode('utf-8'), |
1664 | - tclass.encode('utf-8'), |
1665 | - subphylum.encode('utf-8'), |
1666 | - phylum.encode('utf-8'), |
1667 | - superphylum.encode('utf-8'), |
1668 | - infrakingdom.encode('utf-8'), |
1669 | - subkingdom.encode('utf-8'), |
1670 | - kingdom.encode('utf-8'), |
1671 | - provider.encode('utf-8')] |
1672 | - writer.writerow(this_classification) |
1673 | - |
1674 | + taxa_list = stk._getTaxaFromNewick(tree) |
1675 | + |
1676 | + print "Final taxa count:", len(taxa_list) |
1677 | + |
1678 | |
1679 | def _uniquify(l): |
1680 | """ |
1681 | @@ -379,28 +825,119 @@ |
1682 | |
1683 | return keys.keys() |
1684 | |
1685 | -def add_taxa(tree, new_taxa, taxa_in_clade): |
1686 | +def add_taxa(tree, new_taxa, taxa_in_clade, level): |
1687 | |
1688 | # create new tree of the new taxa |
1689 | - #tree_string = "(" + ",".join(new_taxa) + ");" |
1690 | - #additionalTaxa = stk._parse_tree(tree_string) |
1691 | + additionalTaxa = tree_from_taxonomy(level,new_taxa) |
1692 | |
1693 | # find mrca parent |
1694 | treeobj = stk._parse_tree(tree) |
1695 | mrca = stk.get_mrca(tree,taxa_in_clade) |
1696 | - mrca_parent = treeobj.node(mrca).parent |
1697 | - |
1698 | - # insert a node into the tree between the MRCA and it's parent (p4.addNodeBetweenNodes) |
1699 | - newNode = treeobj.addNodeBetweenNodes(mrca, mrca_parent) |
1700 | - |
1701 | - # add the new tree at the new node using p4.addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None) |
1702 | - #treeobj.addSubTree(newNode, additionalTaxa) |
1703 | - for t in new_taxa: |
1704 | - treeobj.addSibLeaf(newNode,t) |
1705 | - |
1706 | - # return new tree |
1707 | + if (mrca == 0): |
1708 | + # we need to make a new tree! The additional taxa are being placed at the root of the tree |
1709 | + t = Tree() |
1710 | + A = t.add_child() |
1711 | + B = t.add_child() |
1712 | + t1 = Tree(additionalTaxa) |
1713 | + t2 = Tree(tree) |
1714 | + A.add_child(t1) |
1715 | + B.add_child(t2) |
1716 | + return t.write(format=9) |
1717 | + else: |
1718 | + mrca = treeobj.nodes[mrca] |
1719 | + additionalTaxa = stk._parse_tree(additionalTaxa) |
1720 | + |
1721 | + if len(taxa_in_clade) == 1: |
1722 | + taxon = treeobj.node(taxa_in_clade[0]) |
1723 | + mrca = treeobj.addNodeBetweenNodes(taxon,mrca) |
1724 | + |
1725 | + |
1726 | + # insert a node into the tree between the MRCA and it's parent (p4.addNodeBetweenNodes) |
1727 | + # newNode = treeobj.addNodeBetweenNodes(mrca, mrca_parent) |
1728 | + |
1729 | + # add the new tree at the new node using p4.addSubTree(self, selfNode, theSubTree, subTreeTaxNames=None) |
1730 | + treeobj.addSubTree(mrca, additionalTaxa, ignoreRootAssert=True) |
1731 | + |
1732 | return treeobj.writeNewick(fName=None,toString=True).strip() |
1733 | |
1734 | + |
1735 | + |
1736 | +def tree_from_taxonomy(top_level, tree_taxonomy): |
1737 | + |
1738 | + start_level = taxonomy_levels.index(top_level) |
1739 | + new_taxa = tree_taxonomy.keys() |
1740 | + |
1741 | + tl_types = [] |
1742 | + for tt in tree_taxonomy: |
1743 | + tl_types.append(tree_taxonomy[tt][top_level]) |
1744 | + |
1745 | + tl_types = _uniquify(tl_types) |
1746 | + levels_to_worry_about = tlevels[0:tlevels.index(top_level)+1] |
1747 | + |
1748 | + t = Tree() |
1749 | + nodes = {} |
1750 | + nodes[top_level] = [] |
1751 | + for tl in tl_types: |
1752 | + n = t.add_child(name=tl) |
1753 | + nodes[top_level].append({tl:n}) |
1754 | + |
1755 | + for l in levels_to_worry_about[-2::-1]: |
1756 | + names = [] |
1757 | + nodes[l] = [] |
1758 | + ci = levels_to_worry_about.index(l) |
1759 | + for tt in tree_taxonomy: |
1760 | + try: |
1761 | + names.append(tree_taxonomy[tt][l]) |
1762 | + except KeyError: |
1763 | + pass |
1764 | + names = _uniquify(names) |
1765 | + for n in names: |
1766 | + # find my parent |
1767 | + parent = None |
1768 | + for tt in tree_taxonomy: |
1769 | + try: |
1770 | + if tree_taxonomy[tt][l] == n: |
1771 | + try: |
1772 | + parent = tree_taxonomy[tt][levels_to_worry_about[ci+1]] |
1773 | + level = ci+1 |
1774 | + except KeyError: |
1775 | + try: |
1776 | + parent = tree_taxonomy[tt][levels_to_worry_about[ci+2]] |
1777 | + level = ci+2 |
1778 | + except KeyError: |
1779 | + try: |
1780 | + parent = tree_taxonomy[tt][levels_to_worry_about[ci+3]] |
1781 | + level = ci+3 |
1782 | + except KeyError: |
1783 | + print "ERROR: tried to find some taxonomic info for "+tt+" from tree_taxonomy file/downloaded data and I went two levels up, but failed find any. Looked at:\n" |
1784 | + print "\t"+levels_to_worry_about[ci+1] |
1785 | + print "\t"+levels_to_worry_about[ci+2] |
1786 | + print "\t"+levels_to_worry_about[ci+3] |
1787 | + print "This is the taxonomy info I have for "+tt |
1788 | + print tree_taxonomy[tt] |
1789 | + sys.exit(1) |
1790 | + |
1791 | + k = [] |
1792 | + for nd in nodes[levels_to_worry_about[level]]: |
1793 | + k.extend(nd.keys()) |
1794 | + i = 0 |
1795 | + for kk in k: |
1796 | + if kk == parent: |
1797 | + break |
1798 | + i += 1 |
1799 | + parent_id = i |
1800 | + break |
1801 | + except KeyError: |
1802 | + pass # no data at this level for this beastie |
1803 | + # find out where to attach it |
1804 | + node_id = nodes[levels_to_worry_about[level]][parent_id][parent] |
1805 | + nd = node_id.add_child(name=n.replace(" ","_")) |
1806 | + nodes[l].append({n:nd}) |
1807 | + |
1808 | + tree = t.write(format=9) |
1809 | + |
1810 | + return tree |
1811 | + |
1812 | if __name__ == "__main__": |
1813 | main() |
1814 | |
1815 | |
1816 | === modified file 'stk/scripts/plot_character_taxa_matrix.py' |
1817 | --- stk/scripts/plot_character_taxa_matrix.py 2014-12-10 08:55:43 +0000 |
1818 | +++ stk/scripts/plot_character_taxa_matrix.py 2017-01-12 09:27:31 +0000 |
1819 | @@ -42,6 +42,18 @@ |
1820 | default=False |
1821 | ) |
1822 | parser.add_argument( |
1823 | + '-t', |
1824 | + '--taxonomy', |
1825 | + help="Use taxonomy to sort the taxa on the axis. Supply a STK taxonomy file", |
1826 | + ) |
1827 | + parser.add_argument( |
1828 | + '--level', |
1829 | + choices=['family','superfamily','infraorder','suborder','order'], |
1830 | + default='family', |
1831 | + help="""What level to group the taxonomy at. Default is family. |
1832 | + Note data for a particular levelmay be missing in taxonomy.""" |
1833 | + ) |
1834 | + parser.add_argument( |
1835 | 'input_file', |
1836 | metavar='input_file', |
1837 | nargs=1, |
1838 | @@ -59,14 +71,58 @@ |
1839 | verbose = args.verbose |
1840 | input_file = args.input_file[0] |
1841 | output_file = args.output_file[0] |
1842 | + taxonomy = args.taxonomy |
1843 | + level = args.level |
1844 | |
1845 | XML = stk.load_phyml(input_file) |
1846 | + if not taxonomy == None: |
1847 | + taxonomy = stk.load_taxonomy(taxonomy) |
1848 | + |
1849 | all_taxa = stk.get_all_taxa(XML) |
1850 | all_chars_d = stk.get_all_characters(XML) |
1851 | all_chars = [] |
1852 | for c in all_chars_d: |
1853 | all_chars.extend(all_chars_d[c]) |
1854 | |
1855 | + if not taxonomy == None: |
1856 | + tax_data = {} |
1857 | + new_all_taxa = [] |
1858 | + for t in all_taxa: |
1859 | + taxon = t.replace("_"," ") |
1860 | + try: |
1861 | + if taxonomy[taxon][level] == "": |
1862 | + # skip this |
1863 | + continue |
1864 | + tax_data[t] = taxonomy[taxon][level] |
1865 | + except KeyError: |
1866 | + print "Couldn't find "+t+" in taxonomy. Adding as null data" |
1867 | + tax_data[t] = 'zzzzz' # it's at the end... |
1868 | + |
1869 | + from sets import Set |
1870 | + unique = set(tax_data.values()) |
1871 | + unique = list(unique) |
1872 | + unique.sort() |
1873 | + print "Groups are:" |
1874 | + print unique |
1875 | + counts = [] |
1876 | + for u in unique: |
1877 | + count = 0 |
1878 | + for t in tax_data: |
1879 | + if tax_data[t] == u: |
1880 | + count += 1 |
1881 | + new_all_taxa.append(t) |
1882 | + counts.append(count) |
1883 | + |
1884 | + all_taxa = new_all_taxa |
1885 | + # cumulate counts |
1886 | + count_cumulate = [] |
1887 | + count_cumulate.append(counts[0]) |
1888 | + for c in counts[1::]: |
1889 | + count_cumulate.append(c+count_cumulate[-1]) |
1890 | + |
1891 | + print count_cumulate |
1892 | + |
1893 | + |
1894 | taxa_character_matrix = {} |
1895 | for t in all_taxa: |
1896 | taxa_character_matrix[t] = [] |
1897 | @@ -77,7 +133,8 @@ |
1898 | taxa = stk.get_taxa_from_tree(XML,t, sort=True) |
1899 | for taxon in taxa: |
1900 | taxon = taxon.replace(" ","_") |
1901 | - taxa_character_matrix[taxon].extend(chars) |
1902 | + if taxon in all_taxa: |
1903 | + taxa_character_matrix[taxon].extend(chars) |
1904 | |
1905 | for t in taxa_character_matrix: |
1906 | array = taxa_character_matrix[t] |
1907 | @@ -92,6 +149,31 @@ |
1908 | x.append(i) |
1909 | y.append(j) |
1910 | |
1911 | + |
1912 | + i = 0 |
1913 | + for j in all_chars: |
1914 | + # do a substitution of character names to tidy things up |
1915 | + if j.lower().startswith('mitochondrial carrier; adenine nucleotide translocator'): |
1916 | + j = "ANT" |
1917 | + if j.lower().startswith('mitochondrially encoded 12s'): |
1918 | + j = '12S' |
1919 | + if j.lower().startswith('complete mitochondrial genome'): |
1920 | + j = 'Mitogenome' |
1921 | + if j.lower().startswith('mtdna'): |
1922 | + j = "mtDNA restriction sites" |
1923 | + if j.lower().startswith('h3 histone'): |
1924 | + j = 'H3' |
1925 | + if j.lower().startswith('mitochondrially encoded cytochrome'): |
1926 | + j = 'COI' |
1927 | + if j.lower().startswith('rna, 28s'): |
1928 | + j = '28S' |
1929 | + if j.lower().startswith('rna, 18s'): |
1930 | + j = '18S' |
1931 | + if j.lower().startswith('mitochondrially encoded 16s'): |
1932 | + j = '16S' |
1933 | + all_chars[i] = j |
1934 | + i += 1 |
1935 | + |
1936 | fig=figure(figsize=(22,17),dpi=90) |
1937 | fig.subplots_adjust(left=0.3) |
1938 | ax = fig.add_subplot(1,1,1) |
1939 | |
1940 | === modified file 'stk/scripts/plot_tree_taxa_matrix.py' |
1941 | --- stk/scripts/plot_tree_taxa_matrix.py 2014-12-10 08:55:43 +0000 |
1942 | +++ stk/scripts/plot_tree_taxa_matrix.py 2017-01-12 09:27:31 +0000 |
1943 | @@ -43,6 +43,18 @@ |
1944 | default=False |
1945 | ) |
1946 | parser.add_argument( |
1947 | + '-t', |
1948 | + '--taxonomy', |
1949 | + help="Use taxonomy to sort the taxa on the axis. Supply a STK taxonomy file", |
1950 | + ) |
1951 | + parser.add_argument( |
1952 | + '--level', |
1953 | + choices=['family','superfamily','infraorder','suborder','order'], |
1954 | + default='family', |
1955 | + help="""What level to group the taxonomy at. Default is family. |
1956 | + Note data for a particular levelmay be missing in taxonomy.""" |
1957 | + ) |
1958 | + parser.add_argument( |
1959 | 'input_file', |
1960 | metavar='input_file', |
1961 | nargs=1, |
1962 | @@ -60,13 +72,57 @@ |
1963 | verbose = args.verbose |
1964 | input_file = args.input_file[0] |
1965 | output_file = args.output_file[0] |
1966 | + taxonomy = args.taxonomy |
1967 | + level = args.level |
1968 | |
1969 | XML = stk.load_phyml(input_file) |
1970 | + if not taxonomy == None: |
1971 | + taxonomy = stk.load_taxonomy(taxonomy) |
1972 | + |
1973 | all_taxa = stk.get_all_taxa(XML) |
1974 | |
1975 | taxa_tree_matrix = {} |
1976 | for t in all_taxa: |
1977 | taxa_tree_matrix[t] = [] |
1978 | + |
1979 | + if not taxonomy == None: |
1980 | + tax_data = {} |
1981 | + new_all_taxa = [] |
1982 | + for t in all_taxa: |
1983 | + taxon = t.replace("_"," ") |
1984 | + try: |
1985 | + if taxonomy[taxon][level] == "": |
1986 | + # skip this |
1987 | + continue |
1988 | + tax_data[t] = taxonomy[taxon][level] |
1989 | + except KeyError: |
1990 | + print "Couldn't find "+t+" in taxonomy. Adding as null data" |
1991 | + tax_data[t] = 'zzzzz' # it's at the end... |
1992 | + |
1993 | + from sets import Set |
1994 | + unique = set(tax_data.values()) |
1995 | + unique = list(unique) |
1996 | + unique.sort() |
1997 | + print "Groups are:" |
1998 | + print unique |
1999 | + counts = [] |
2000 | + for u in unique: |
2001 | + count = 0 |
2002 | + for t in tax_data: |
2003 | + if tax_data[t] == u: |
2004 | + count += 1 |
2005 | + new_all_taxa.append(t) |
2006 | + counts.append(count) |
2007 | + |
2008 | + all_taxa = new_all_taxa |
2009 | + # cumulate counts |
2010 | + count_cumulate = [] |
2011 | + count_cumulate.append(counts[0]) |
2012 | + for c in counts[1::]: |
2013 | + count_cumulate.append(c+count_cumulate[-1]) |
2014 | + |
2015 | + print count_cumulate |
2016 | + |
2017 | |
2018 | trees = stk.obtain_trees(XML) |
2019 | i = 0 |
2020 | |
2021 | === modified file 'stk/scripts/remove_poorly_constrained_taxa.py' |
2022 | --- stk/scripts/remove_poorly_constrained_taxa.py 2014-04-18 11:57:14 +0000 |
2023 | +++ stk/scripts/remove_poorly_constrained_taxa.py 2017-01-12 09:27:31 +0000 |
2024 | @@ -12,8 +12,8 @@ |
2025 | |
2026 | # do stuff |
2027 | parser = argparse.ArgumentParser( |
2028 | - prog="convert tree from specific to generic", |
2029 | - description="""Converts a tree at specific level to generic level""", |
2030 | + prog="remove poorly contrained taxa", |
2031 | + description="""Remove taxa that appea in one source tree only.""", |
2032 | ) |
2033 | parser.add_argument( |
2034 | '-v', |
2035 | @@ -34,6 +34,13 @@ |
2036 | " to removal those in polytomies *and* only in one other tree." |
2037 | ) |
2038 | parser.add_argument( |
2039 | + '--tree_only', |
2040 | + default=False, |
2041 | + action='store_true', |
2042 | + help="Restrict removal of taxa that only occur in one source tree. Default"+ |
2043 | + " to removal those in polytomies *and* only in one other tree." |
2044 | + ) |
2045 | + parser.add_argument( |
2046 | 'input_phyml', |
2047 | metavar='input_phyml', |
2048 | nargs=1, |
2049 | @@ -43,13 +50,13 @@ |
2050 | 'input_tree', |
2051 | metavar='input_tree', |
2052 | nargs=1, |
2053 | - help="Your tree" |
2054 | + help="Your tree - can be NULL or None" |
2055 | ) |
2056 | parser.add_argument( |
2057 | 'output_tree', |
2058 | metavar='output_tree', |
2059 | nargs=1, |
2060 | - help="Your output tree" |
2061 | + help="Your output tree or phyml - if input_tree is none, this is the Phyml" |
2062 | ) |
2063 | |
2064 | |
2065 | @@ -62,14 +69,20 @@ |
2066 | dl = True |
2067 | poly_only = args.poly_only |
2068 | input_tree = args.input_tree[0] |
2069 | - output_tree = args.output_tree[0] |
2070 | + if input_tree == 'NULL' or input_tree == 'None': |
2071 | + input_tree = None |
2072 | + output_file = args.output_tree[0] |
2073 | input_phyml = args.input_phyml[0] |
2074 | |
2075 | XML = stk.load_phyml(input_phyml) |
2076 | # load tree |
2077 | - supertree = stk.import_tree(input_tree) |
2078 | + if (not input_tree == None): |
2079 | + supertree = stk.import_tree(input_tree) |
2080 | + taxa = stk._getTaxaFromNewick(supertree) |
2081 | + else: |
2082 | + supertree = None |
2083 | + taxa = stk.get_all_taxa(XML) |
2084 | # grab taxa |
2085 | - taxa = stk._getTaxaFromNewick(supertree) |
2086 | delete_list = [] |
2087 | |
2088 | # loop over taxa in supertree and get some stats |
2089 | @@ -115,19 +128,29 @@ |
2090 | |
2091 | print "Taxa: "+str(len(taxa)) |
2092 | print "Deleting: "+str(len(delete_list)) |
2093 | - # done, so delete the problem taxa from the supertree |
2094 | - for t in delete_list: |
2095 | - # remove taxa from supertree |
2096 | - supertree = stk._sub_taxa_in_tree(supertree,t) |
2097 | - |
2098 | - # save supertree |
2099 | - tree = {} |
2100 | - tree['Tree_1'] = supertree |
2101 | - output = stk._amalgamate_trees(tree,format='nexus') |
2102 | - # write file |
2103 | - f = open(output_tree,"w") |
2104 | - f.write(output) |
2105 | - f.close() |
2106 | + |
2107 | + if not supertree == None: |
2108 | + # done, so delete the problem taxa from the supertree |
2109 | + for t in delete_list: |
2110 | + # remove taxa from supertree |
2111 | + supertree = stk._sub_taxa_in_tree(supertree,t) |
2112 | + |
2113 | + # save supertree |
2114 | + tree = {} |
2115 | + tree['Tree_1'] = supertree |
2116 | + output = stk._amalgamate_trees(tree,format='nexus') |
2117 | + # write file |
2118 | + f = open(output_file,"w") |
2119 | + f.write(output) |
2120 | + f.close() |
2121 | + else: |
2122 | + new_phyml = stk.substitute_taxa(XML,delete_list) |
2123 | + # write file |
2124 | + f = open(output_file,"w") |
2125 | + f.write(new_phyml) |
2126 | + f.close() |
2127 | + |
2128 | + |
2129 | |
2130 | if (dl): |
2131 | # write file |
2132 | |
2133 | === added file 'stk/scripts/tree_from_taxonomy.py' |
2134 | --- stk/scripts/tree_from_taxonomy.py 1970-01-01 00:00:00 +0000 |
2135 | +++ stk/scripts/tree_from_taxonomy.py 2017-01-12 09:27:31 +0000 |
2136 | @@ -0,0 +1,142 @@ |
2137 | +# trees ready for supretree construction. |
2138 | +# Copyright (C) 2015, Jon Hill, Katie Davis |
2139 | +# |
2140 | +# This program is free software: you can redistribute it and/or modify |
2141 | +# it under the terms of the GNU General Public License as published by |
2142 | +# the Free Software Foundation, either version 3 of the License, or |
2143 | +# (at your option) any later version. |
2144 | +# |
2145 | +# This program is distributed in the hope that it will be useful, |
2146 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of |
2147 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
2148 | +# GNU General Public License for more details. |
2149 | +# |
2150 | +# You should have received a copy of the GNU General Public License |
2151 | +# along with this program. If not, see <http://www.gnu.org/licenses/>. |
2152 | +# |
2153 | +# Jon Hill. jon.hill@york.ac.uk |
2154 | + |
2155 | +import argparse |
2156 | +import copy |
2157 | +import os |
2158 | +import sys |
2159 | +stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir ) |
2160 | +sys.path.insert(0, stk_path) |
2161 | +import supertree_toolkit as stk |
2162 | +import csv |
2163 | +from ete2 import Tree |
2164 | + |
2165 | +taxonomy_levels = ['species','subgenus','genus','subfamily','family','superfamily','subsection','section','infraorder','suborder','order','superorder','subclass','class','superclass','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] |
2166 | +tlevels = ['species','genus','family','order','class','phylum','kingdom'] |
2167 | + |
2168 | + |
2169 | +def main(): |
2170 | + |
2171 | + # do stuff |
2172 | + parser = argparse.ArgumentParser( |
2173 | + prog="create a tree from a taxonomy file", |
2174 | + description="Create a taxonomic tree", |
2175 | + ) |
2176 | + parser.add_argument( |
2177 | + '-v', |
2178 | + '--verbose', |
2179 | + action='store_true', |
2180 | + help="Verbose output: mainly progress reports.", |
2181 | + default=False |
2182 | + ) |
2183 | + parser.add_argument( |
2184 | + 'top_level', |
2185 | + nargs=1, |
2186 | + help="The top level group to start with, e.g. family" |
2187 | + ) |
2188 | + parser.add_argument( |
2189 | + 'input_file', |
2190 | + metavar='input_file', |
2191 | + nargs=1, |
2192 | + help="Your taxonomy file" |
2193 | + ) |
2194 | + parser.add_argument( |
2195 | + 'output_file', |
2196 | + metavar='output_file', |
2197 | + nargs=1, |
2198 | + help="Your new tree file" |
2199 | + ) |
2200 | + |
2201 | + args = parser.parse_args() |
2202 | + verbose = args.verbose |
2203 | + input_file = args.input_file[0] |
2204 | + output_file = args.output_file[0] |
2205 | + top_level = args.top_level[0] |
2206 | + |
2207 | + start_level = taxonomy_levels.index(top_level) |
2208 | + tree_taxonomy = stk.load_taxonomy(input_file) |
2209 | + new_taxa = tree_taxonomy.keys() |
2210 | + |
2211 | + tl_types = [] |
2212 | + for tt in tree_taxonomy: |
2213 | + tl_types.append(tree_taxonomy[tt][top_level]) |
2214 | + |
2215 | + tl_types = _uniquify(tl_types) |
2216 | + levels_to_worry_about = tlevels[0:tlevels.index(top_level)+1] |
2217 | + |
2218 | + #print levels_to_worry_about[-2::-1] |
2219 | + |
2220 | + t = Tree() |
2221 | + nodes = {} |
2222 | + nodes[top_level] = [] |
2223 | + for tl in tl_types: |
2224 | + n = t.add_child(name=tl) |
2225 | + nodes[top_level].append({tl:n}) |
2226 | + |
2227 | + for l in levels_to_worry_about[-2::-1]: |
2228 | + #print t |
2229 | + names = [] |
2230 | + nodes[l] = [] |
2231 | + ci = levels_to_worry_about.index(l) |
2232 | + for tt in tree_taxonomy: |
2233 | + names.append(tree_taxonomy[tt][l]) |
2234 | + names = _uniquify(names) |
2235 | + for n in names: |
2236 | + #print n |
2237 | + # find my parent |
2238 | + parent = None |
2239 | + for tt in tree_taxonomy: |
2240 | + if tree_taxonomy[tt][l] == n: |
2241 | + parent = tree_taxonomy[tt][levels_to_worry_about[ci+1]] |
2242 | + k = [] |
2243 | + for nd in nodes[levels_to_worry_about[ci+1]]: |
2244 | + k.extend(nd.keys()) |
2245 | + i = 0 |
2246 | + for kk in k: |
2247 | + print kk |
2248 | + if kk == parent: |
2249 | + break |
2250 | + i += 1 |
2251 | + parent_id = i |
2252 | + break |
2253 | + # find out where to attach it |
2254 | + node_id = nodes[levels_to_worry_about[ci+1]][parent_id][parent] |
2255 | + nd = node_id.add_child(name=n.replace(" ","_")) |
2256 | + nodes[l].append({n:nd}) |
2257 | + |
2258 | + tree = t.write(format=9) |
2259 | + tree = stk._collapse_nodes(tree) |
2260 | + tree = stk._collapse_nodes(tree) |
2261 | + print tree |
2262 | + |
2263 | + |
2264 | +def _uniquify(l): |
2265 | + """ |
2266 | + Make a list, l, contain only unique data |
2267 | + """ |
2268 | + keys = {} |
2269 | + for e in l: |
2270 | + keys[e] = 1 |
2271 | + |
2272 | + return keys.keys() |
2273 | + |
2274 | +if __name__ == "__main__": |
2275 | + main() |
2276 | + |
2277 | + |
2278 | + |
2279 | |
2280 | === modified file 'stk/stk' |
2281 | --- stk/stk 2014-12-09 10:58:48 +0000 |
2282 | +++ stk/stk 2017-01-12 09:27:31 +0000 |
2283 | @@ -23,6 +23,7 @@ |
2284 | import sys |
2285 | import argparse |
2286 | import traceback |
2287 | +import time |
2288 | try: |
2289 | __file__ |
2290 | except NameError: |
2291 | @@ -41,6 +42,10 @@ |
2292 | import string |
2293 | import stk.p4 as p4 |
2294 | import lxml |
2295 | +import csv |
2296 | +import tempfile |
2297 | +from subprocess import check_call, CalledProcessError, call |
2298 | + |
2299 | import stk.bzr_version as bzr_version |
2300 | d = bzr_version.version_info |
2301 | build = d.get('revno','<unknown revno>') |
2302 | @@ -366,7 +371,7 @@ |
2303 | |
2304 | # Clean data |
2305 | parser_cm = subparsers.add_parser('clean_data', |
2306 | - help='Remove errant taxa, uninformative trees and empty sources.' |
2307 | + help='Renames all sources and trees sensibly. Removes errant taxa, uninformative trees and empty sources.' |
2308 | ) |
2309 | parser_cm.add_argument('input', |
2310 | help='The input phyml file') |
2311 | @@ -488,7 +493,81 @@ |
2312 | parser_cm.add_argument('subs', |
2313 | help='The subs file') |
2314 | parser_cm.set_defaults(func=check_subs) |
2315 | - |
2316 | + |
2317 | + # taxonomic name checker |
2318 | + parser_cm = subparsers.add_parser('check_otus', |
2319 | + help='Check your OTUs against EoL.' |
2320 | + ) |
2321 | + parser_cm.add_argument('input', |
2322 | + help='The input Phyml. Also accepts tree files or a simple list') |
2323 | + parser_cm.add_argument('output', |
2324 | + help='The output CSV file. Taxon, synonyms, status') |
2325 | + parser_cm.add_argument('--overwrite', |
2326 | + action='store_true', |
2327 | + default=False, |
2328 | + help="Overwrite the existing file without asking for confirmation") |
2329 | + parser_cm.set_defaults(func=check_otus) |
2330 | + |
2331 | + # create taxonomy csv file |
2332 | + parser_cm = subparsers.add_parser('create_taxonomy', |
2333 | + help='Create a taxonomy file in CSV for you to then augment.' |
2334 | + ) |
2335 | + parser_cm.add_argument('input', |
2336 | + help='The input Phyml. Also accepts tree files or a simple list') |
2337 | + parser_cm.add_argument('output', |
2338 | + help='The output CSV file. Name, followed by classification and source') |
2339 | + parser_cm.add_argument('--overwrite', |
2340 | + action='store_true', |
2341 | + default=False, |
2342 | + help="Overwrite the existing file without asking for confirmation") |
2343 | + parser_cm.add_argument('--taxonomy', |
2344 | + help="Give a starting taxonomy file, e.g. one you ran earlier",) |
2345 | + parser_cm.set_defaults(func=create_taxonomy) |
2346 | + |
2347 | + |
2348 | + # do the subs in a one go using taxonomy |
2349 | + parser_cm = subparsers.add_parser('auto_subs', |
2350 | + help='Using a taxonomy, generate a species level version of your data in one go.' |
2351 | + ) |
2352 | + parser_cm.add_argument('input', |
2353 | + help='The input Phyml') |
2354 | + parser_cm.add_argument('taxonomy', |
2355 | + help='Your taxonomy file', |
2356 | + ) |
2357 | + parser_cm.add_argument('output', |
2358 | + help='The output phyml') |
2359 | + parser_cm.add_argument('--overwrite', |
2360 | + action='store_true', |
2361 | + default=False, |
2362 | + help="Overwrite the existing file without asking for confirmation") |
2363 | + #parser_cm.add_argument('--level', |
2364 | + # choices=supertree_toolkit.taxonomy_levels, |
2365 | + # help="Taxonomic level to output at",) |
2366 | + parser_cm.set_defaults(func=auto_subs) |
2367 | + |
2368 | + |
2369 | + # attempt to process the data into a matrix all automatically |
2370 | + parser_cm = subparsers.add_parser('process', |
2371 | + help='Generate a species-level matrix, and do all the checks and processing automatically. Note this creates a taxonomy and does all the processing, but will not be perfect (as taxonomies are not perfect)' |
2372 | + ) |
2373 | + parser_cm.add_argument('input', |
2374 | + help='The input Phyml') |
2375 | + parser_cm.add_argument('output', |
2376 | + help='The output matrix') |
2377 | + parser_cm.add_argument('--taxonomy_file', |
2378 | + help='Existing taxonomy file to prevent redownloading data. Any taxa not in the file will be checked online, so partial complete file are OK.') |
2379 | + parser_cm.add_argument('--equivalents_file', |
2380 | + help='Existing equivalents file from a taxonomic name check. Any taxa not in the file will be checked online, so partially complete files are OK.') |
2381 | + parser_cm.add_argument('--overwrite', |
2382 | + action='store_true', |
2383 | + default=False, |
2384 | + help="Overwrite the existing file without asking for confirmation") |
2385 | + parser_cm.add_argument('--no_store', |
2386 | + action="store_true", |
2387 | + default=False, |
2388 | + help="Do not store intermediate files -- not recommended") |
2389 | + parser_cm.set_defaults(func=process) |
2390 | + |
2391 | |
2392 | # before we let argparse work its magic, check for --version |
2393 | if "--version" in sys.argv: |
2394 | @@ -602,7 +681,7 @@ |
2395 | # check if output files are there |
2396 | if (output_file and os.path.exists(output_file) and not overwrite): |
2397 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2398 | - print "Do you wish to continue? [Y/n]" |
2399 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2400 | while True: |
2401 | k=inkey() |
2402 | if k.lower() == 'n': |
2403 | @@ -612,7 +691,7 @@ |
2404 | break |
2405 | if (not newphyml == None and os.path.exists(newphyml) and not overwrite): |
2406 | print "Output Phyml file exists. Either remove the file or use the --overwrite flag." |
2407 | - print "Do you wish to continue? [Y/n]" |
2408 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2409 | while True: |
2410 | k=inkey() |
2411 | if k.lower() == 'n': |
2412 | @@ -624,9 +703,9 @@ |
2413 | XML = supertree_toolkit.load_phyml(input_file) |
2414 | try: |
2415 | if (newphyml == None): |
2416 | - data_independence = supertree_toolkit.data_independence(XML,ignoreWarnings=ignoreWarnings) |
2417 | + data_independence, subsets = supertree_toolkit.data_independence(XML,ignoreWarnings=ignoreWarnings) |
2418 | else: |
2419 | - data_independence, new_phyml = supertree_toolkit.data_independence(XML,make_new_xml=True,ignoreWarnings=ignoreWarnings) |
2420 | + data_independence, subsets, new_phyml = supertree_toolkit.data_independence(XML,make_new_xml=True,ignoreWarnings=ignoreWarnings) |
2421 | except NotUniqueError as detail: |
2422 | msg = "***Error: Failed to check independence.\n"+detail.msg |
2423 | print msg |
2424 | @@ -644,7 +723,7 @@ |
2425 | print msg |
2426 | return |
2427 | except: |
2428 | - msg = "***Error: failed to check independence due to unknown error." |
2429 | + msg = "***Error: failed to check independence due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" |
2430 | print msg |
2431 | traceback.print_exc() |
2432 | return |
2433 | @@ -653,16 +732,14 @@ |
2434 | data_ind = "" |
2435 | #column headers |
2436 | data_ind = "Source trees that are subsets of others\n" |
2437 | - data_ind = data_ind + "Flagged tree, is a subset of:\n" |
2438 | - for name in data_independence: |
2439 | - if ( data_independence[name][1] == supertree_toolkit.SUBSET ): |
2440 | - data_ind += name + "," + data_independence[name][0] + "\n" |
2441 | + data_ind = data_ind + "Flagged tree(s), is/are subset(s) of:\n" |
2442 | + for names in subsets: |
2443 | + data_ind += names[1:] + "," + names[0] + "\n" |
2444 | |
2445 | data_ind += "\n\nSource trees that are identical to others\n" |
2446 | - data_ind = data_ind + "Flagged tree, is identical to:\n" |
2447 | - for name in data_independence: |
2448 | - if ( data_independence[name][1] == supertree_toolkit.IDENTICAL ): |
2449 | - data_ind += name + "," + data_independence[name][0] + "\n" |
2450 | + data_ind = data_ind + "Flagged tree(s), is/are identical to:\n" |
2451 | + for names in data_independence: |
2452 | + data_ind += names[1:] + "," + names[0] + "\n" |
2453 | |
2454 | |
2455 | if (output_file == False or |
2456 | @@ -762,7 +839,7 @@ |
2457 | # Does the output file already exist? |
2458 | if (os.path.exists(output_file) and not overwrite): |
2459 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2460 | - print "Do you wish to continue? [Y/n]" |
2461 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2462 | while True: |
2463 | k=inkey() |
2464 | if k.lower() == 'n': |
2465 | @@ -771,6 +848,7 @@ |
2466 | if k.lower() == 'y': |
2467 | break |
2468 | try: |
2469 | + |
2470 | XML = supertree_toolkit.load_phyml(input_file) |
2471 | input_is_xml = True |
2472 | except: |
2473 | @@ -896,7 +974,7 @@ |
2474 | # Does the output file already exist? |
2475 | if (os.path.exists(output_file) and not overwrite): |
2476 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2477 | - print "Do you wish to continue? [Y/n]" |
2478 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2479 | while True: |
2480 | k=inkey() |
2481 | if k.lower() == 'n': |
2482 | @@ -942,7 +1020,7 @@ |
2483 | print msg |
2484 | return |
2485 | except: |
2486 | - msg = "***Error: Failed sbstituting taxa due to unknown error.\n" |
2487 | + msg = "***Error: Failed sbstituting taxa due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2488 | print msg |
2489 | traceback.print_exc() |
2490 | return |
2491 | @@ -983,7 +1061,7 @@ |
2492 | |
2493 | if (os.path.exists(output_file) and not overwrite): |
2494 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2495 | - print "Do you wish to continue? [Y/n]" |
2496 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2497 | while True: |
2498 | k=inkey() |
2499 | if k.lower() == 'n': |
2500 | @@ -1013,7 +1091,7 @@ |
2501 | print msg |
2502 | return |
2503 | except: |
2504 | - msg = "***Error: Failed sbstituting taxa due to unknown error.\n" |
2505 | + msg = "***Error: Failed sbstituting taxa due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2506 | print msg |
2507 | traceback.print_exc() |
2508 | return |
2509 | @@ -1060,7 +1138,7 @@ |
2510 | print msg |
2511 | return |
2512 | except: |
2513 | - msg = "***Error: Failed to export data due to unknown error.\n" |
2514 | + msg = "***Error: Failed to export data due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2515 | print msg |
2516 | traceback.print_exc() |
2517 | return |
2518 | @@ -1115,7 +1193,7 @@ |
2519 | print msg |
2520 | return |
2521 | except: |
2522 | - msg = "***Error: Failed to check overlap due to unknown error.\n" |
2523 | + msg = "***Error: Failed to check overlap due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2524 | print msg |
2525 | traceback.print_exc() |
2526 | return |
2527 | @@ -1161,7 +1239,7 @@ |
2528 | # check if output files are there |
2529 | if (output_file and os.path.exists(output_file) and not overwrite): |
2530 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2531 | - print "Do you wish to continue? [Y/n]" |
2532 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2533 | while True: |
2534 | k=inkey() |
2535 | if k.lower() == 'n': |
2536 | @@ -1191,7 +1269,7 @@ |
2537 | print msg |
2538 | return |
2539 | except: |
2540 | - msg = "***Error: Failed to export trees due to unknown error.\n" |
2541 | + msg = "***Error: Failed to export trees due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2542 | print msg |
2543 | traceback.print_exc() |
2544 | return |
2545 | @@ -1220,7 +1298,7 @@ |
2546 | # check if output files are there |
2547 | if (output_file and os.path.exists(output_file) and not overwrite): |
2548 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2549 | - print "Do you wish to continue? [Y/n]" |
2550 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2551 | while True: |
2552 | k=inkey() |
2553 | if k.lower() == 'n': |
2554 | @@ -1309,7 +1387,7 @@ |
2555 | print msg |
2556 | return |
2557 | except: |
2558 | - msg = "***Error: Failed to permute trees due to unknown error.\n" |
2559 | + msg = "***Error: Failed to permute trees due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2560 | print msg |
2561 | traceback.print_exc() |
2562 | return |
2563 | @@ -1347,7 +1425,7 @@ |
2564 | # check if output files are there |
2565 | if (os.path.exists(output_file) and not overwrite): |
2566 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2567 | - print "Do you wish to continue? [Y/n]" |
2568 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2569 | while True: |
2570 | k=inkey() |
2571 | if k.lower() == 'n': |
2572 | @@ -1376,7 +1454,7 @@ |
2573 | print msg |
2574 | return |
2575 | except: |
2576 | - msg = "***Error: Failed to clean data due to unknown error.\n" |
2577 | + msg = "***Error: Failed to clean data due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2578 | print msg |
2579 | traceback.print_exc() |
2580 | return |
2581 | @@ -1404,7 +1482,7 @@ |
2582 | # check if output files are there |
2583 | if (os.path.exists(output_file) and not overwrite): |
2584 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2585 | - print "Do you wish to continue? [Y/n]" |
2586 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2587 | while True: |
2588 | k=inkey() |
2589 | if k.lower() == 'n': |
2590 | @@ -1433,7 +1511,7 @@ |
2591 | print msg |
2592 | return |
2593 | except: |
2594 | - msg = "***Error: Failed to replace genera due to unknown error.\n" |
2595 | + msg = "***Error: Failed to replace genera due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2596 | print msg |
2597 | traceback.print_exc() |
2598 | return |
2599 | @@ -1488,7 +1566,7 @@ |
2600 | new_trees = {} |
2601 | i = 1 |
2602 | for t in trees: |
2603 | - new_trees['tree_'+str(i)] = t |
2604 | + new_trees['tree_'+str(i)] = supertree_toolkit._collapse_nodes(t) |
2605 | i += 1 |
2606 | output = supertree_toolkit._amalgamate_trees(new_trees,format=output_format) |
2607 | except TreeParseError as detail: |
2608 | @@ -1503,7 +1581,7 @@ |
2609 | # check if output files are there |
2610 | if (os.path.exists(output_file) and not overwrite): |
2611 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2612 | - print "Do you wish to continue? [Y/n]" |
2613 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2614 | while True: |
2615 | k=inkey() |
2616 | if k.lower() == 'n': |
2617 | @@ -1540,7 +1618,7 @@ |
2618 | # check if output files are there |
2619 | if (os.path.exists(output_file) and not overwrite): |
2620 | print "Output file exists. Either remove the file or use the --overwrite flag." |
2621 | - print "Do you wish to continue? [Y/n]" |
2622 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2623 | while True: |
2624 | k=inkey() |
2625 | if k.lower() == 'n': |
2626 | @@ -1589,7 +1667,7 @@ |
2627 | print msg |
2628 | return |
2629 | except: |
2630 | - msg = "***Error: Failed to create subset due to unknown error.\n" |
2631 | + msg = "***Error: Failed to create subset due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2632 | print msg |
2633 | traceback.print_exc() |
2634 | return |
2635 | @@ -1637,6 +1715,681 @@ |
2636 | print "**************************************************************\n" |
2637 | |
2638 | |
2639 | +def check_otus(args): |
2640 | + """check out the OTUs in the Phyml - are they considered valid?""" |
2641 | + |
2642 | + verbose = args.verbose |
2643 | + input_file = args.input |
2644 | + output_file = args.output |
2645 | + |
2646 | + print input_file |
2647 | + if (input_file.endswith(".phyml")): |
2648 | + XML = supertree_toolkit.load_phyml(input_file) |
2649 | + try: |
2650 | + equivs = supertree_toolkit.taxonomic_checker(XML, verbose=verbose) |
2651 | + except NotUniqueError as detail: |
2652 | + msg = "***Error: Failed to check OTUs.\n"+detail.msg |
2653 | + print msg |
2654 | + return |
2655 | + except InvalidSTKData as detail: |
2656 | + msg = "***Error: Failed to check OTUs.\n"+detail.msg |
2657 | + print msg |
2658 | + return |
2659 | + except UninformativeTreeError as detail: |
2660 | + msg = "***Error: Failed to check OTUs.\n"+detail.msg |
2661 | + print msg |
2662 | + return |
2663 | + except TreeParseError as detail: |
2664 | + msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg |
2665 | + print msg |
2666 | + return |
2667 | + except: |
2668 | + # what about no internet conenction? What error do that throw? |
2669 | + msg = "***Error: failed to create OTUs due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" |
2670 | + print msg |
2671 | + traceback.print_exc() |
2672 | + return |
2673 | + elif (input_file.endswith(".txt") or input_file.endswith('.dat')): |
2674 | + # read file - assume one taxa per line |
2675 | + with open(input_file,'r') as f: |
2676 | + lines = f.read().splitlines() |
2677 | + equivs = supertree_toolkit.taxonomic_checker_list(lines, verbose=verbose) |
2678 | + else: |
2679 | + # assume a tree! |
2680 | + equivs = supertree_toolkit.taxonomic_checker_tree(input_file, verbose=verbose) |
2681 | + |
2682 | + |
2683 | + |
2684 | + f = open(output_file,"w") |
2685 | + for taxon in sorted(equivs.keys()): |
2686 | + f.write(taxon+","+";".join(equivs[taxon][0])+","+equivs[taxon][1]+"\n") |
2687 | + f.close() |
2688 | + |
2689 | + |
2690 | + |
2691 | +def create_taxonomy(args): |
2692 | + """create a taxonomic heirachy for each OTU in the Phyml""" |
2693 | + |
2694 | + verbose = args.verbose |
2695 | + input_file = args.input |
2696 | + output_file = args.output |
2697 | + existing_taxonomy = args.taxonomy |
2698 | + ignoreWarnings = args.ignoreWarnings |
2699 | + |
2700 | + XML = supertree_toolkit.load_phyml(input_file) |
2701 | + if (not existing_taxonomy == None): |
2702 | + existing_taxonomy = supertree_toolkit.load_taxonomy(existing_taxonomy) # load it in and create the dictionary |
2703 | + pass |
2704 | + |
2705 | + try: |
2706 | + taxonomy = supertree_toolkit.create_taxonomy(XML,existing_taxonomy=existing_taxonomy,verbose=verbose,ignoreWarnings=ignoreWarnings) |
2707 | + except NotUniqueError as detail: |
2708 | + msg = "***Error: Failed to create taxonomy.\n"+detail.msg |
2709 | + print msg |
2710 | + return |
2711 | + except InvalidSTKData as detail: |
2712 | + msg = "***Error: Failed to create taxonomy.\n"+detail.msg |
2713 | + print msg |
2714 | + return |
2715 | + except UninformativeTreeError as detail: |
2716 | + msg = "***Error: Failed to create taxonomy.\n"+detail.msg |
2717 | + print msg |
2718 | + return |
2719 | + except TreeParseError as detail: |
2720 | + msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg |
2721 | + print msg |
2722 | + return |
2723 | + except: |
2724 | + # what about no internet conenction? What error do that throw? |
2725 | + msg = "***Error: failed to create taxonomy due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" |
2726 | + print msg |
2727 | + traceback.print_exc() |
2728 | + return |
2729 | + |
2730 | + # Now create the CSV output |
2731 | + with open(output_file, 'w') as f: |
2732 | + writer = csv.writer(f) |
2733 | + headers = [] |
2734 | + headers.append("OTU") |
2735 | + headers.extend(supertree_toolkit.taxonomy_levels) |
2736 | + headers.append("Data source") |
2737 | + writer.writerow(headers) |
2738 | + for t in taxonomy: |
2739 | + otu = t |
2740 | + try: |
2741 | + species = taxonomy[t]['species'] |
2742 | + except KeyError: |
2743 | + species = "-" |
2744 | + try: |
2745 | + genus = taxonomy[t]['genus'] |
2746 | + except KeyError: |
2747 | + genus = "-" |
2748 | + try: |
2749 | + family = taxonomy[t]['family'] |
2750 | + except KeyError: |
2751 | + family = "-" |
2752 | + try: |
2753 | + superfamily = taxonomy[t]['superfamily'] |
2754 | + except KeyError: |
2755 | + superfamily = "-" |
2756 | + try: |
2757 | + infraorder = taxonomy[t]['infraorder'] |
2758 | + except KeyError: |
2759 | + infraorder = "-" |
2760 | + try: |
2761 | + suborder = taxonomy[t]['suborder'] |
2762 | + except KeyError: |
2763 | + suborder = "-" |
2764 | + try: |
2765 | + order = taxonomy[t]['order'] |
2766 | + except KeyError: |
2767 | + order = "-" |
2768 | + try: |
2769 | + superorder = taxonomy[t]['superorder'] |
2770 | + except KeyError: |
2771 | + superorder = "-" |
2772 | + try: |
2773 | + subclass = taxonomy[t]['subclass'] |
2774 | + except KeyError: |
2775 | + subclass = "-" |
2776 | + try: |
2777 | + tclass = taxonomy[t]['class'] |
2778 | + except KeyError: |
2779 | + tclass = "-" |
2780 | + try: |
2781 | + subphylum = taxonomy[t]['subphylum'] |
2782 | + except KeyError: |
2783 | + subphylum = "-" |
2784 | + try: |
2785 | + phylum = taxonomy[t]['phylum'] |
2786 | + except KeyError: |
2787 | + phylum = "-" |
2788 | + try: |
2789 | + superphylum = taxonomy[t]['superphylum'] |
2790 | + except KeyError: |
2791 | + superphylum = "-" |
2792 | + try: |
2793 | + infrakingdom = taxonomy[t]['infrakingdom'] |
2794 | + except: |
2795 | + infrakingdom = "-" |
2796 | + try: |
2797 | + subkingdom = taxonomy[t]['subkingdom'] |
2798 | + except: |
2799 | + subkingdom = "-" |
2800 | + try: |
2801 | + kingdom = taxonomy[t]['kingdom'] |
2802 | + except KeyError: |
2803 | + kingdom = "-" |
2804 | + try: |
2805 | + provider = taxonomy[t]['provider'] |
2806 | + except KeyError: |
2807 | + provider = "-" |
2808 | + |
2809 | + if (isinstance(species, list)): |
2810 | + species = " ".join(species) |
2811 | + this_classification = [ |
2812 | + otu.encode('utf-8'), |
2813 | + species.encode('utf-8'), |
2814 | + genus.encode('utf-8'), |
2815 | + family.encode('utf-8'), |
2816 | + superfamily.encode('utf-8'), |
2817 | + infraorder.encode('utf-8'), |
2818 | + suborder.encode('utf-8'), |
2819 | + order.encode('utf-8'), |
2820 | + superorder.encode('utf-8'), |
2821 | + subclass.encode('utf-8'), |
2822 | + tclass.encode('utf-8'), |
2823 | + subphylum.encode('utf-8'), |
2824 | + phylum.encode('utf-8'), |
2825 | + superphylum.encode('utf-8'), |
2826 | + infrakingdom.encode('utf-8'), |
2827 | + subkingdom.encode('utf-8'), |
2828 | + kingdom.encode('utf-8'), |
2829 | + provider.encode('utf-8')] |
2830 | + writer.writerow(this_classification) |
2831 | + |
2832 | +def auto_subs(args): |
2833 | + """Get all OTUs to the same taxonomic level""" |
2834 | + |
2835 | + |
2836 | + verbose = args.verbose |
2837 | + input_file = args.input |
2838 | + output = args.output |
2839 | + taxonomy = args.taxonomy |
2840 | + ignoreWarnings = args.ignoreWarnings |
2841 | + |
2842 | + if (os.path.exists(output) and not overwrite): |
2843 | + print "Output Phyml file exists. Either remove the file or use the --overwrite flag." |
2844 | + print "Do you wish to continue and overwrite the file anyway?? [Y/n]" |
2845 | + while True: |
2846 | + k=inkey() |
2847 | + if k.lower() == 'n': |
2848 | + print "Exiting..." |
2849 | + sys.exit(0) |
2850 | + if k.lower() == 'y': |
2851 | + break |
2852 | + |
2853 | + XML = supertree_toolkit.load_phyml(input_file) |
2854 | + taxonomy = supertree_toolkit.load_taxonomy(taxonomy) # load it in and create the dictionary |
2855 | + |
2856 | + try: |
2857 | + newXML = supertree_toolkit.generate_species_level_data(XML,taxonomy,verbose=verbose,ignoreWarnings=ignoreWarnings) |
2858 | + except NotUniqueError as detail: |
2859 | + msg = "***Error: Failed to carry out auto subs.\n"+detail.msg |
2860 | + print msg |
2861 | + return |
2862 | + except InvalidSTKData as detail: |
2863 | + msg = "***Error: Failed to carry out auto subs.\n"+detail.msg |
2864 | + print msg |
2865 | + return |
2866 | + except UninformativeTreeError as detail: |
2867 | + msg = "***Error: Failed to carry out auto subs.\n"+detail.msg |
2868 | + print msg |
2869 | + return |
2870 | + except TreeParseError as detail: |
2871 | + msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg |
2872 | + print msg |
2873 | + return |
2874 | + except NoneCompleteTaxonomy as detail: |
2875 | + msg = "***Error: Failed to carry out auto subs.\n"+detail.msg |
2876 | + print msg |
2877 | + return |
2878 | + except: |
2879 | + # what about no internet conenction? What error do that throw? |
2880 | + msg = "***Error: failed to carry out auto subs due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" |
2881 | + print msg |
2882 | + traceback.print_exc() |
2883 | + return |
2884 | + |
2885 | + f = open(output,"w") |
2886 | + f.write(newXML) |
2887 | + f.close() |
2888 | + |
2889 | +def process(args): |
2890 | + |
2891 | + verbose = args.verbose |
2892 | + input_file = args.input |
2893 | + output = args.output |
2894 | + no_store = args.no_store |
2895 | + ignoreWarnings = args.ignoreWarnings |
2896 | + taxonomy_file = args.taxonomy_file |
2897 | + equivalents_file = args.equivalents_file |
2898 | + overwrite = args.overwrite |
2899 | + |
2900 | + if (os.path.exists(output) and not overwrite): |
2901 | + print "Output matrix file exists. Either remove the file or use the --overwrite flag." |
2902 | + print "Do you wish to continue and overwrite the file anyway? [Y/n]" |
2903 | + while True: |
2904 | + k=inkey() |
2905 | + if k.lower() == 'n': |
2906 | + print "Exiting..." |
2907 | + sys.exit(0) |
2908 | + if k.lower() == 'y': |
2909 | + break |
2910 | + |
2911 | + filename = os.path.basename(input_file) |
2912 | + dirname = os.path.dirname(input_file) |
2913 | + |
2914 | + if verbose: |
2915 | + print "Loading and checking your data" |
2916 | + # 0) load and check data |
2917 | + try: |
2918 | + phyml = supertree_toolkit.load_phyml(input_file) |
2919 | + project_name = supertree_toolkit.get_project_name(phyml) |
2920 | + supertree_toolkit._check_data(phyml) |
2921 | + except NotUniqueError as detail: |
2922 | + msg = "***Error: Failed to load data.\n"+detail.msg |
2923 | + print msg |
2924 | + return |
2925 | + except InvalidSTKData as detail: |
2926 | + msg = "***Error: Failed to load data.\n"+detail.msg |
2927 | + print msg |
2928 | + return |
2929 | + except UninformativeTreeError as detail: |
2930 | + msg = "***Error: Failed to load data.\n"+detail.msg |
2931 | + print msg |
2932 | + return |
2933 | + except TreeParseError as detail: |
2934 | + msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg |
2935 | + print msg |
2936 | + return |
2937 | + except: |
2938 | + msg = "***Error: Failed to load input due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2939 | + print msg |
2940 | + traceback.print_exc() |
2941 | + return |
2942 | + |
2943 | + if verbose: |
2944 | + print "Checking taxa againt online databases" |
2945 | + # 1) taxonomy checker with autoreplace |
2946 | + # Load existing data if any: |
2947 | + if (not equivalents_file == None): |
2948 | + equivalents = supertree_toolkit.load_equivalents(equivalents_file) |
2949 | + else: |
2950 | + equivalents = None |
2951 | + equivalents = supertree_toolkit.taxonomic_checker(phyml,existing_data=equivalents,verbose=verbose) |
2952 | + # save the equivalents for later (as CSV and as sub file) |
2953 | + data_string_csv = _equivalents_to_csv(equivalents) |
2954 | + data_string_subs = _equivalents_to_subs(equivalents) |
2955 | + f = open(os.path.join(dirname,project_name+"_taxonomy_checker.csv"), "w") |
2956 | + f.write(data_string_csv) |
2957 | + f.close() |
2958 | + f = open(os.path.join(dirname,project_name+"_taxonomy_check_subs.dat"), "w") |
2959 | + f.write(data_string_subs) |
2960 | + f.close() |
2961 | + |
2962 | + # now do the replacements - we use the subs file :) |
2963 | + if verbose: |
2964 | + print "Swapping in the corrected taxa names" |
2965 | + try: |
2966 | + old_taxa, new_taxa = supertree_toolkit.parse_subs_file(os.path.join(dirname,project_name+"_taxonomy_check_subs.dat")) |
2967 | + except UnableToParseSubsFile as e: |
2968 | + print e.msg |
2969 | + sys.exit(-1) |
2970 | + try: |
2971 | + phyml = supertree_toolkit.substitute_taxa(phyml,old_taxa,new_taxa,only_existing=False,verbose=verbose) |
2972 | + except NotUniqueError as detail: |
2973 | + msg = "***Error: Failed to substituting taxa.\n"+detail.msg |
2974 | + print msg |
2975 | + return |
2976 | + except InvalidSTKData as detail: |
2977 | + msg = "***Error: Failed substituting taxa.\n"+detail.msg |
2978 | + print msg |
2979 | + return |
2980 | + except UninformativeTreeError as detail: |
2981 | + msg = "***Error: Failed to substituting taxa.\n"+detail.msg |
2982 | + print msg |
2983 | + return |
2984 | + except TreeParseError as detail: |
2985 | + msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg |
2986 | + print msg |
2987 | + return |
2988 | + except: |
2989 | + msg = "***Error: Failed sbstituting taxa due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
2990 | + print msg |
2991 | + traceback.print_exc() |
2992 | + return |
2993 | + # save phyml as intermediate step |
2994 | + f = open(os.path.join(dirname,project_name+"_taxonomy_checked.phyml"), "w") |
2995 | + f.write(phyml) |
2996 | + f.close() |
2997 | + |
2998 | + |
2999 | + if verbose: |
3000 | + print "Creating taxonomic information" |
3001 | + # 2) create taxonomy |
3002 | + if (not taxonomy_file == None): |
3003 | + taxonomy = supertree_toolkit.load_taxonomy(taxonomy_file) |
3004 | + else: |
3005 | + taxonomy = None |
3006 | + taxonomy = supertree_toolkit.create_taxonomy(phyml,existing_taxonomy=taxonomy,verbose=verbose) |
3007 | + # save the taxonomy for later |
3008 | + # Now create the CSV output - seperate out into function in STK (used several times) |
3009 | + with open(os.path.join(dirname,project_name+"_taxonomy.csv"), 'w') as f: |
3010 | + writer = csv.writer(f) |
3011 | + headers = [] |
3012 | + headers.append("OTU") |
3013 | + headers.extend(supertree_toolkit.taxonomy_levels) |
3014 | + headers.append("Data source") |
3015 | + writer.writerow(headers) |
3016 | + for t in taxonomy: |
3017 | + otu = t |
3018 | + try: |
3019 | + species = taxonomy[t]['species'] |
3020 | + except KeyError: |
3021 | + species = "-" |
3022 | + try: |
3023 | + subgenus = taxonomy[t]['subgenus'] |
3024 | + except KeyError: |
3025 | + subgenus = "-" |
3026 | + try: |
3027 | + genus = taxonomy[t]['genus'] |
3028 | + except KeyError: |
3029 | + genus = "-" |
3030 | + try: |
3031 | + subfamily = taxonomy[t]['subfamily'] |
3032 | + except KeyError: |
3033 | + subfamily = "-" |
3034 | + try: |
3035 | + family = taxonomy[t]['family'] |
3036 | + except KeyError: |
3037 | + family = "-" |
3038 | + try: |
3039 | + superfamily = taxonomy[t]['superfamily'] |
3040 | + except KeyError: |
3041 | + superfamily = "-" |
3042 | + try: |
3043 | + subsection = taxonomy[t]['subsection'] |
3044 | + except KeyError: |
3045 | + subsection = "-" |
3046 | + try: |
3047 | + section = taxonomy[t]['section'] |
3048 | + except KeyError: |
3049 | + section = "-" |
3050 | + try: |
3051 | + infraorder = taxonomy[t]['infraorder'] |
3052 | + except KeyError: |
3053 | + infraorder = "-" |
3054 | + try: |
3055 | + suborder = taxonomy[t]['suborder'] |
3056 | + except KeyError: |
3057 | + suborder = "-" |
3058 | + try: |
3059 | + order = taxonomy[t]['order'] |
3060 | + except KeyError: |
3061 | + order = "-" |
3062 | + try: |
3063 | + superorder = taxonomy[t]['superorder'] |
3064 | + except KeyError: |
3065 | + superorder = "-" |
3066 | + try: |
3067 | + subclass = taxonomy[t]['subclass'] |
3068 | + except KeyError: |
3069 | + subclass = "-" |
3070 | + try: |
3071 | + tclass = taxonomy[t]['class'] |
3072 | + except KeyError: |
3073 | + tclass = "-" |
3074 | + try: |
3075 | + superclass = taxonomy[t]['superclass'] |
3076 | + except KeyError: |
3077 | + superclass = "-" |
3078 | + try: |
3079 | + subphylum = taxonomy[t]['subphylum'] |
3080 | + except KeyError: |
3081 | + subphylum = "-" |
3082 | + try: |
3083 | + phylum = taxonomy[t]['phylum'] |
3084 | + except KeyError: |
3085 | + phylum = "-" |
3086 | + try: |
3087 | + superphylum = taxonomy[t]['superphylum'] |
3088 | + except KeyError: |
3089 | + superphylum = "-" |
3090 | + try: |
3091 | + infrakingdom = taxonomy[t]['infrakingdom'] |
3092 | + except: |
3093 | + infrakingdom = "-" |
3094 | + try: |
3095 | + subkingdom = taxonomy[t]['subkingdom'] |
3096 | + except: |
3097 | + subkingdom = "-" |
3098 | + try: |
3099 | + kingdom = taxonomy[t]['kingdom'] |
3100 | + except KeyError: |
3101 | + kingdom = "-" |
3102 | + try: |
3103 | + provider = taxonomy[t]['provider'] |
3104 | + except KeyError: |
3105 | + provider = "-" |
3106 | + this_classification = [ |
3107 | + otu.encode('utf-8'), |
3108 | + species.encode('utf-8'), |
3109 | + subgenus.encode('utf-8'), |
3110 | + genus.encode('utf-8'), |
3111 | + subfamily.encode('utf-8'), |
3112 | + family.encode('utf-8'), |
3113 | + superfamily.encode('utf-8'), |
3114 | + subsection.encode('utf-8'), |
3115 | + section.encode('utf-8'), |
3116 | + infraorder.encode('utf-8'), |
3117 | + suborder.encode('utf-8'), |
3118 | + order.encode('utf-8'), |
3119 | + superorder.encode('utf-8'), |
3120 | + subclass.encode('utf-8'), |
3121 | + tclass.encode('utf-8'), |
3122 | + superclass.encode('utf-8'), |
3123 | + subphylum.encode('utf-8'), |
3124 | + phylum.encode('utf-8'), |
3125 | + superphylum.encode('utf-8'), |
3126 | + infrakingdom.encode('utf-8'), |
3127 | + subkingdom.encode('utf-8'), |
3128 | + kingdom.encode('utf-8'), |
3129 | + provider.encode('utf-8')] |
3130 | + writer.writerow(this_classification) |
3131 | + |
3132 | + # 3) create species level dataset |
3133 | + if verbose: |
3134 | + print "Converting data to species level" |
3135 | + try: |
3136 | + phyml = supertree_toolkit.generate_species_level_data(phyml,taxonomy,verbose=verbose) |
3137 | + except NotUniqueError as detail: |
3138 | + msg = "***Error: Failed to carry out auto subs.\n"+detail.msg |
3139 | + print msg |
3140 | + return |
3141 | + except InvalidSTKData as detail: |
3142 | + msg = "***Error: Failed to carry out auto subs.\n"+detail.msg |
3143 | + print msg |
3144 | + return |
3145 | + except UninformativeTreeError as detail: |
3146 | + msg = "***Error: Failed to carry out auto subs.\n"+detail.msg |
3147 | + print msg |
3148 | + return |
3149 | + except TreeParseError as detail: |
3150 | + msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg |
3151 | + print msg |
3152 | + return |
3153 | + except NoneCompleteTaxonomy as detail: |
3154 | + msg = "***Error: Failed to carry out auto subs.\n"+detail.msg |
3155 | + print msg |
3156 | + return |
3157 | + except: |
3158 | + # what about no internet conenction? What error do that throw? |
3159 | + msg = "***Error: failed to carry out auto subs due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit" |
3160 | + print msg |
3161 | + traceback.print_exc() |
3162 | + return |
3163 | + # save the phyml as intermediate step |
3164 | + f = open(os.path.join(dirname,project_name+"_species_level.phyml"), "w") |
3165 | + f.write(phyml) |
3166 | + f.close() |
3167 | + |
3168 | + # 4) Remove non-monophyletic taxa (requires TNT to be installed) |
3169 | + if verbose: |
3170 | + print "Removing non-monophyletic taxa via mini-supertree method" |
3171 | + tree_list = supertree_toolkit._find_trees_for_permuting(phyml) |
3172 | + try: |
3173 | + for t in tree_list: |
3174 | + # permute |
3175 | + output_string = supertree_toolkit.permute_tree(tree_list[t],matrix='hennig',treefile=None,verbose=verbose) |
3176 | + #save |
3177 | + if (not output_string == ""): |
3178 | + file_name = os.path.basename(filename) |
3179 | + dirname = os.path.dirname(filename) |
3180 | + new_output = os.path.join(dirname,t,t+"_matrix.tnt") |
3181 | + try: |
3182 | + os.makedirs(os.path.join(dirname,t)) |
3183 | + except OSError: |
3184 | + if not os.path.isdir(os.path.join(dirname,t)): |
3185 | + raise |
3186 | + f = open(new_output,'w',0) |
3187 | + f.write(output_string) |
3188 | + f.close |
3189 | + time.sleep(1) |
3190 | + |
3191 | + # now create the tnt command to deal with this |
3192 | + # create a tmp file for the output tree |
3193 | + temp_file_handle, temp_file = tempfile.mkstemp(suffix=".tnt") |
3194 | + tnt_command = "tnt mxram 512,run "+new_output+",echo= ,timeout 00:10:00,rseed0,rseed*,hold 1000,xmult= level 0,taxname=,nelsen *,tsave *"+temp_file+",save /,quit" |
3195 | + #tnt_command = "tnt run "+new_output+",ienum,taxname=,nelsen*,tsave *"+temp_file+",save /,quit" |
3196 | + # run tnt, grab the output and store back in the data |
3197 | + #try: |
3198 | + call(tnt_command, shell=True) |
3199 | + #except CalledProcessError as e: |
3200 | + # msg = "***Error: Failed to run TNT. Is it installed correctl?.\n"+e.msg |
3201 | + # print msg |
3202 | + # return |
3203 | + #ret = os.system(tnt_command) |
3204 | + #if (not ret == 0): |
3205 | + # print "error running tnt" |
3206 | + # return |
3207 | + |
3208 | + new_tree = supertree_toolkit.import_tree(temp_file) |
3209 | + phyml = supertree_toolkit._swap_tree_in_XML(phyml,new_tree,t) |
3210 | + |
3211 | + except TreeParseError as e: |
3212 | + msg = "***Error permuting trees.\n"+e.msg |
3213 | + print msg |
3214 | + return |
3215 | + |
3216 | + #4.5) remove MRP_Outgroups |
3217 | + phyml = supertree_toolkit.substitute_taxa(phyml,'MRP_Outgroup') |
3218 | + phyml = supertree_toolkit.substitute_taxa(phyml,'MRPOutgroup') |
3219 | + phyml = supertree_toolkit.substitute_taxa(phyml,'MRP_outgroup') |
3220 | + phyml = supertree_toolkit.substitute_taxa(phyml,'MRPoutgroup') |
3221 | + phyml = supertree_toolkit.substitute_taxa(phyml,'MRPOUTGROUP') |
3222 | + |
3223 | + # save intermediate phyml |
3224 | + f = open(os.path.join(dirname,project_name+"_nonmonophyl_removed.phyml"), "w") |
3225 | + f.write(phyml) |
3226 | + f.close() |
3227 | + |
3228 | + |
3229 | + # 5) Remove common names |
3230 | + # no function to do this yet... |
3231 | + |
3232 | + # 6) Data independance |
3233 | + if verbose: |
3234 | + print "Checking data independence" |
3235 | + data_ind,subsets,phyml = supertree_toolkit.data_independence(phyml,make_new_xml=True) |
3236 | + # save phyml |
3237 | + f = open(os.path.join(dirname,project_name+"_data_ind.phyml"), "w") |
3238 | + f.write(phyml) |
3239 | + f.close() |
3240 | + |
3241 | + # 7) Data overlap |
3242 | + if verbose: |
3243 | + print "Checking data overlap" |
3244 | + sufficient_overlap, key_list = supertree_toolkit.data_overlap(phyml,verbose=verbose) |
3245 | + # process the key_list to remove the unconnected trees |
3246 | + if not sufficient_overlap: |
3247 | + # we don't, have enough, then remove all but the largest group. |
3248 | + # the key contains a list, with the largest group first (thanks networkX!) |
3249 | + # we can therefore just remove trees from everything but the first in the list |
3250 | + delete_me = [] |
3251 | + for t in key_list[1::]: # skip 0 |
3252 | + delete_me.extend(t) |
3253 | + for tree in delete_me: |
3254 | + phyml = supertree_toolkit._swap_tree_in_XML(phyml, None, tree, delete=True) # delete the tree and clean the data as we go |
3255 | + # save phyml |
3256 | + f = open(os.path.join(dirname,project_name+"_data_tax_overlap.phyml"), "w") |
3257 | + f.write(phyml) |
3258 | + f.close() |
3259 | + |
3260 | + |
3261 | + # 8) Create matrix |
3262 | + if verbose: |
3263 | + print "Creating matrix" |
3264 | + try: |
3265 | + matrix = supertree_toolkit.create_matrix(phyml) |
3266 | + except NotUniqueError as detail: |
3267 | + msg = "***Error: Failed to create matrix.\n"+detail.msg |
3268 | + print msg |
3269 | + return |
3270 | + except InvalidSTKData as detail: |
3271 | + msg = "***Error: Failed to create matrix.\n"+detail.msg |
3272 | + print msg |
3273 | + return |
3274 | + except UninformativeTreeError as detail: |
3275 | + msg = "***Error: Failed to create matrix.\n"+detail.msg |
3276 | + print msg |
3277 | + return |
3278 | + except TreeParseError as detail: |
3279 | + msg = "***Error: failed to parse a tree in your data set.\n"+detail.msg |
3280 | + print msg |
3281 | + return |
3282 | + except: |
3283 | + msg = "***Error: Failed to create matrix due to unknown error. File a bug report, please!\nhttps://bugs.launchpad.net/supertree-toolkit\n" |
3284 | + print msg |
3285 | + traceback.print_exc() |
3286 | + return |
3287 | + |
3288 | + f = open(output, "w") |
3289 | + f.write(matrix) |
3290 | + f.close() |
3291 | + |
3292 | + return |
3293 | + |
3294 | + |
3295 | +def _equivalents_to_csv(equivalents): |
3296 | + |
3297 | + output_string = 'Taxa,Equivalents,Status\n' |
3298 | + |
3299 | + for taxon in sorted(equivalents): |
3300 | + output_string += taxon + "," + ';'.join(equivalents[taxon][0]) + "," + equivalents[taxon][1] + "\n" |
3301 | + |
3302 | + return output_string |
3303 | + |
3304 | + |
3305 | +def _equivalents_to_subs(equivalents): |
3306 | + """Only corrects the yellow ones. Red and green are left alone""" |
3307 | + |
3308 | + output_string = "" |
3309 | + for taxon in sorted(equivalents): |
3310 | + if (equivalents[taxon][1] == 'yellow'): |
3311 | + # the first name is always the correct one |
3312 | + output_string += taxon + " = "+equivalents[taxon][0][0]+"\n" |
3313 | + return output_string |
3314 | |
3315 | if __name__ == "__main__": |
3316 | main() |
3317 | |
3318 | === modified file 'stk/stk_exceptions.py' |
3319 | --- stk/stk_exceptions.py 2013-10-22 08:26:54 +0000 |
3320 | +++ stk/stk_exceptions.py 2017-01-12 09:27:31 +0000 |
3321 | @@ -134,4 +134,12 @@ |
3322 | def __init__(self, msg): |
3323 | self.msg = msg |
3324 | |
3325 | +class NoneCompleteTaxonomy(Error): |
3326 | + """Exception raised when a taxonomy is not complete for these data |
3327 | + Attributes: |
3328 | + msg -- explaination of error |
3329 | + """ |
3330 | + |
3331 | + def __init__(self, msg): |
3332 | + self.msg = msg |
3333 | |
3334 | |
3335 | === modified file 'stk/supertree_toolkit.py' |
3336 | --- stk/supertree_toolkit.py 2017-01-11 15:16:21 +0000 |
3337 | +++ stk/supertree_toolkit.py 2017-01-12 09:27:31 +0000 |
3338 | @@ -44,15 +44,49 @@ |
3339 | import unicodedata |
3340 | from stk_internals import * |
3341 | from copy import deepcopy |
3342 | +import Queue |
3343 | +import threading |
3344 | +import urllib2 |
3345 | +from urllib import quote_plus |
3346 | +import simplejson as json |
3347 | +import time |
3348 | import types |
3349 | |
3350 | #plt.ion() |
3351 | |
3352 | +sys.setrecursionlimit(50000) |
3353 | # GLOBAL VARIABLES |
3354 | IDENTICAL = 0 |
3355 | SUBSET = 1 |
3356 | PLATFORM = sys.platform |
3357 | -taxonomy_levels = ['species','genus','family','superfamily','infraorder','suborder','order','superorder','subclass','class','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] |
3358 | +#Logging |
3359 | +import logging |
3360 | +logging.basicConfig(filename='supertreetoolkit.log', level=logging.DEBUG, format='%(asctime)s %(levelname)s:%(message)s', datefmt='%m/%d/%Y %I:%M:%S %p') |
3361 | + |
3362 | +# taxonomy levels |
3363 | +# What we get from EOL |
3364 | +current_taxonomy_levels = ['species','genus','family','order','class','phylum','kingdom'] |
3365 | +# And the extra ones from ITIS |
3366 | +extra_taxonomy_levels = ['superfamily','infraorder','suborder','superorder','subclass','subphylum','superphylum','infrakingdom','subkingdom'] |
3367 | +# all of them in order |
3368 | +taxonomy_levels = ['species','subgenus','genus','tribe','subfamily','family','superfamily','subsection','section','parvorder','infraorder','suborder','order','superorder','subclass','class','superclass','subphylum','phylum','superphylum','infrakingdom','subkingdom','kingdom'] |
3369 | + |
3370 | +SPECIES = taxonomy_levels[0] |
3371 | +GENUS = taxonomy_levels[1] |
3372 | +FAMILY = taxonomy_levels[2] |
3373 | +SUPERFAMILY = taxonomy_levels[3] |
3374 | +INFRAORDER = taxonomy_levels[4] |
3375 | +SUBORDER = taxonomy_levels[5] |
3376 | +ORDER = taxonomy_levels[6] |
3377 | +SUPERORDER = taxonomy_levels[7] |
3378 | +SUBCLASS = taxonomy_levels[8] |
3379 | +CLASS = taxonomy_levels[9] |
3380 | +SUBPHYLUM = taxonomy_levels[10] |
3381 | +PHYLUM = taxonomy_levels[11] |
3382 | +SUPERPHYLUM = taxonomy_levels[12] |
3383 | +INFRAKINGDOM = taxonomy_levels[13] |
3384 | +SUBKINGDOM = taxonomy_levels[14] |
3385 | +KINGDOM = taxonomy_levels[15] |
3386 | |
3387 | # supertree_toolkit is the backend for the STK. Loaded by both the GUI and |
3388 | # CLI, this contains all the functions to actually *do* something |
3389 | @@ -60,6 +94,17 @@ |
3390 | # All functions take XML and a list of other arguments, process the data and return |
3391 | # it back to the user interface handler to save it somewhere |
3392 | |
3393 | + |
3394 | +def get_project_name(XML): |
3395 | + """ |
3396 | + Get the name of the dataset currently being worked on |
3397 | + """ |
3398 | + |
3399 | + xml_root = _parse_xml(XML) |
3400 | + |
3401 | + return xml_root.xpath('/phylo_storage/project_name/string_value')[0].text |
3402 | + |
3403 | + |
3404 | def create_name(authors, year, append=''): |
3405 | """ |
3406 | Construct a sensible from a list of authors and a year for a |
3407 | @@ -161,6 +206,22 @@ |
3408 | |
3409 | return names |
3410 | |
3411 | +def get_all_tree_names(XML): |
3412 | + """ From a full XML-PHYML string, extract all tree names. |
3413 | + """ |
3414 | + |
3415 | + xml_root = _parse_xml(XML) |
3416 | + find = etree.XPath("//source") |
3417 | + sources = find(xml_root) |
3418 | + names = [] |
3419 | + for s in sources: |
3420 | + for st in s.xpath("source_tree"): |
3421 | + if 'name' in st.attrib and not st.attrib['name'] == "": |
3422 | + names.append(st.attrib['name']) |
3423 | + |
3424 | + return names |
3425 | + |
3426 | + |
3427 | def set_unique_names(XML): |
3428 | """ Ensures all sources have unique names. |
3429 | """ |
3430 | @@ -249,9 +310,17 @@ |
3431 | if (ele.tag == "source"): |
3432 | sources.append(ele) |
3433 | |
3434 | + if overwrite: |
3435 | + # remove all the names first |
3436 | + for s in sources: |
3437 | + for st in s.xpath("source_tree"): |
3438 | + if 'name' in st.attrib: |
3439 | + del st.attrib['name'] |
3440 | + |
3441 | + |
3442 | for s in sources: |
3443 | for st in s.xpath("source_tree"): |
3444 | - if overwrite or not 'name' in st.attrib: |
3445 | + if not'name' in st.attrib: |
3446 | tree_name = create_tree_name(XML,st) |
3447 | st.attrib['name'] = tree_name |
3448 | |
3449 | @@ -339,7 +408,7 @@ |
3450 | taxa = etree.SubElement(s_tree,"taxa_data") |
3451 | taxa.tail="\n " |
3452 | # Note: we do not add all elements as otherwise they get set to some option |
3453 | - # rather than remaining blank (and hence blue int he interface) |
3454 | + # rather than remaining blank (and hence blue in the interface) |
3455 | |
3456 | # append our new source to the main tree |
3457 | # if sources has no valid source, overwrite, |
3458 | @@ -877,7 +946,7 @@ |
3459 | # Need to add checks on the file. Problems include: |
3460 | # TNT: outputs Phyllip format or something - basically a Newick |
3461 | # string without commas, so add 'em back in |
3462 | - m = re.search(r'proc-;', content) |
3463 | + m = re.search(r'proc.;', content) |
3464 | if (m != None): |
3465 | # TNT output tree |
3466 | # Done on a Mac? Replace ^M with a newline |
3467 | @@ -1402,6 +1471,36 @@ |
3468 | |
3469 | return _amalgamate_trees(trees,format,anonymous) |
3470 | |
3471 | +def get_taxa_from_tree_for_taxonomy(tree, pretty=False, ignoreErrors=False): |
3472 | + """Returns a list of all taxa available for the tree passed as argument. |
3473 | + :param tree: string with the data for the tree in Newick format. |
3474 | + :type tree: string |
3475 | + :param pretty: defines if '_' in taxa names should be replaced with spaces. |
3476 | + :type pretty: boolean |
3477 | + :param ignoreErrors: should execution continue on error? |
3478 | + :type ignoreErrors: boolean |
3479 | + :returns: list of strings with the taxa names, sorted alphabetically |
3480 | + :rtype: list |
3481 | + """ |
3482 | + taxa_list = [] |
3483 | + |
3484 | + try: |
3485 | + taxa_list.extend(_getTaxaFromNewick(tree)) |
3486 | + except TreeParseError as detail: |
3487 | + if (ignoreErrors): |
3488 | + logging.warning(detail.msg) |
3489 | + pass |
3490 | + else: |
3491 | + raise TreeParseError( detail.msg ) |
3492 | + |
3493 | + # now uniquify the list of taxa |
3494 | + taxa_list = _uniquify(taxa_list) |
3495 | + taxa_list.sort() |
3496 | + |
3497 | + if (pretty): |
3498 | + taxa_list = [x.replace('_', ' ') for x in taxa_list] |
3499 | + |
3500 | + return taxa_list |
3501 | |
3502 | def get_all_taxa(XML, pretty=False, ignoreErrors=False): |
3503 | """ Produce a taxa list by scanning all trees within |
3504 | @@ -1422,21 +1521,17 @@ |
3505 | taxa_list.extend(_getTaxaFromNewick(t)) |
3506 | except TreeParseError as detail: |
3507 | if (ignoreErrors): |
3508 | + logging.warning(detail.msg) |
3509 | pass |
3510 | else: |
3511 | raise TreeParseError( detail.msg ) |
3512 | |
3513 | - |
3514 | - |
3515 | # now uniquify the list of taxa |
3516 | taxa_list = _uniquify(taxa_list) |
3517 | taxa_list.sort() |
3518 | |
3519 | - if (pretty): |
3520 | - unpretty_tl = taxa_list |
3521 | - taxa_list = [] |
3522 | - for t in unpretty_tl: |
3523 | - taxa_list.append(t.replace('_',' ')) |
3524 | + if (pretty): #Remove underscores from names |
3525 | + taxa_list = [x.replace('_', ' ') for x in taxa_list] |
3526 | |
3527 | return taxa_list |
3528 | |
3529 | @@ -1508,7 +1603,7 @@ |
3530 | return outgroups |
3531 | |
3532 | |
3533 | -def create_matrix(XML,format="hennig",quote=False,taxonomy=None,outgroups=False,ignoreWarnings=False): |
3534 | +def create_matrix(XML,format="hennig",quote=False,taxonomy=None,outgroups=False,ignoreWarnings=False, verbose=False): |
3535 | """ From all trees in the XML, create a matrix |
3536 | """ |
3537 | |
3538 | @@ -1553,7 +1648,7 @@ |
3539 | taxa.sort() |
3540 | taxa.insert(0,"MRP_Outgroup") |
3541 | |
3542 | - return _create_matrix(trees, taxa, format=format, quote=quote, weights=weights) |
3543 | + return _create_matrix(trees, taxa, format=format, quote=quote, weights=weights,verbose=verbose) |
3544 | |
3545 | |
3546 | def create_matrix_from_trees(trees,format="hennig"): |
3547 | @@ -1925,7 +2020,7 @@ |
3548 | _check_data(XML) |
3549 | |
3550 | xml_root = _parse_xml(XML) |
3551 | - proj_name = xml_root.xpath('/phylo_storage/project_name/string_value')[0].text |
3552 | + proj_name = get_project_name(XML) |
3553 | |
3554 | output_string = "======================\n" |
3555 | output_string += " Data summary of: " + proj_name + "\n" |
3556 | @@ -1989,6 +2084,188 @@ |
3557 | |
3558 | return output_string |
3559 | |
3560 | +def taxonomic_checker_list(name_list,existing_data=None,verbose=False): |
3561 | + """ For each name in the database generate a database of the original name, |
3562 | + possible synonyms and if the taxon is not know, signal that. We do this by |
3563 | + using the EoL API to grab synonyms of each taxon. """ |
3564 | + |
3565 | + import urllib2 |
3566 | + from urllib import quote_plus |
3567 | + import simplejson as json |
3568 | + |
3569 | + if existing_data == None: |
3570 | + equivalents = {} |
3571 | + else: |
3572 | + equivalents = existing_data |
3573 | + |
3574 | + # for each taxon, check the name on EoL - what if it's a synonym? Does EoL still return a result? |
3575 | + # if not, is there another API function to do this? |
3576 | + # search for the taxon and grab the name - if you search for a recognised synonym on EoL then |
3577 | + # you get the original ('correct') name - shorten this to two words and you're done. |
3578 | + for t in name_list: |
3579 | + if t in equivalents: |
3580 | + continue |
3581 | + taxon = t.replace("_"," ") |
3582 | + if (verbose): |
3583 | + print "Looking up ", taxon |
3584 | + # get the data from EOL on taxon |
3585 | + taxonq = quote_plus(taxon) |
3586 | + URL = "http://eol.org/api/search/1.0.json?q="+taxonq |
3587 | + req = urllib2.Request(URL) |
3588 | + opener = urllib2.build_opener() |
3589 | + f = opener.open(req) |
3590 | + data = json.load(f) |
3591 | + # check if there's some data |
3592 | + if len(data['results']) == 0: |
3593 | + equivalents[t] = [[t],'red'] |
3594 | + continue |
3595 | + amber = False |
3596 | + if len(data['results']) > 1: |
3597 | + # this is not great - we have multiple hits for this taxon - needs the user to go back and warn about this |
3598 | + # for automatic processing we'll just take the first one though |
3599 | + # colour is amber in this case |
3600 | + amber = True |
3601 | + ID = str(data['results'][0]['id']) # take first hit |
3602 | + URL = "http://eol.org/api/pages/1.0/"+ID+".json?images=0&videos=0&sounds=0&maps=0&text=0&iucn=false&subjects=overview&licenses=all&details=true&common_names=true&synonyms=true&references=true&vetted=0" |
3603 | + req = urllib2.Request(URL) |
3604 | + opener = urllib2.build_opener() |
3605 | + |
3606 | + try: |
3607 | + f = opener.open(req) |
3608 | + except urllib2.HTTPError: |
3609 | + equivalents[t] = [[t],'red'] |
3610 | + continue |
3611 | + data = json.load(f) |
3612 | + if len(data['scientificName']) == 0: |
3613 | + # not found a scientific name, so set as red |
3614 | + equivalents[t] = [[t],'red'] |
3615 | + continue |
3616 | + correct_name = data['scientificName'].encode("ascii","ignore") |
3617 | + # we only want the first two bits of the name, not the original author and year if any |
3618 | + temp_name = correct_name.split(' ') |
3619 | + if (len(temp_name) > 2): |
3620 | + correct_name = ' '.join(temp_name[0:2]) |
3621 | + correct_name = correct_name.replace(' ','_') |
3622 | + |
3623 | + # build up the output dictionary - original name is key, synonyms/missing is value |
3624 | + if (correct_name == t): |
3625 | + # if the original matches the 'correct', then it's green |
3626 | + equivalents[t] = [[t], 'green'] |
3627 | + else: |
3628 | + # if we managed to get something anyway, then it's yellow and create a list of possible synonyms with the |
3629 | + # 'correct' taxon at the top |
3630 | + eol_synonyms = data['synonyms'] |
3631 | + synonyms = [] |
3632 | + for s in eol_synonyms: |
3633 | + ts = s['synonym'].encode("ascii","ignore") |
3634 | + temp_syn = ts.split(' ') |
3635 | + if (len(temp_syn) > 2): |
3636 | + temp_syn = ' '.join(temp_syn[0:2]) |
3637 | + ts = temp_syn |
3638 | + if (s['relationship'] == "synonym"): |
3639 | + ts = ts.replace(" ","_") |
3640 | + synonyms.append(ts) |
3641 | + synonyms = _uniquify(synonyms) |
3642 | + # we need to put the correct name at the top of the list now |
3643 | + if (correct_name in synonyms): |
3644 | + synonyms.insert(0, synonyms.pop(synonyms.index(correct_name))) |
3645 | + elif len(synonyms) == 0: |
3646 | + synonyms.append(correct_name) |
3647 | + else: |
3648 | + synonyms.insert(0,correct_name) |
3649 | + |
3650 | + if (amber): |
3651 | + equivalents[t] = [synonyms,'amber'] |
3652 | + else: |
3653 | + equivalents[t] = [synonyms,'yellow'] |
3654 | + # if our search was empty, then it's red - see above |
3655 | + |
3656 | + # up to the calling funciton to do something sensible with this |
3657 | + # we build a dictionary of names and then a list of synonyms or the original name, then a tag if it's green, yellow, red. |
3658 | + # Amber means we found synonyms and multilpe hits. User def needs to sort these! |
3659 | + |
3660 | + return equivalents |
3661 | + |
3662 | +def taxonomic_checker_tree(tree_file,existing_data=None,verbose=False): |
3663 | + """ For each name in the database generate a database of the original name, |
3664 | + possible synonyms and if the taxon is not know, signal that. We do this by |
3665 | + using the EoL API to grab synonyms of each taxon. """ |
3666 | + |
3667 | + tree = import_tree(tree_file) |
3668 | + p4tree = _parse_tree(tree) |
3669 | + taxa = p4tree.getAllLeafNames(p4tree.root) |
3670 | + if existing_data == None: |
3671 | + equivalents = {} |
3672 | + else: |
3673 | + equivalents = existing_data |
3674 | + |
3675 | + equivalents = taxonomic_checker_list(taxa,existing_data,verbose) |
3676 | + return equivalents |
3677 | + |
3678 | +def taxonomic_checker(XML,existing_data=None,verbose=False): |
3679 | + """ For each name in the database generate a database of the original name, |
3680 | + possible synonyms and if the taxon is not know, signal that. We do this by |
3681 | + using the EoL API to grab synonyms of each taxon. """ |
3682 | + |
3683 | + # grab all taxa |
3684 | + taxa = get_all_taxa(XML) |
3685 | + |
3686 | + if existing_data == None: |
3687 | + equivalents = {} |
3688 | + else: |
3689 | + equivalents = existing_data |
3690 | + |
3691 | + equivalents = taxonomic_checker_list(taxa,existing_data,verbose) |
3692 | + return equivalents |
3693 | + |
3694 | + |
3695 | +def load_equivalents(equiv_csv): |
3696 | + """Load equivalents data from a csv and convert to a equivalents Dict. |
3697 | + Structure is key, with a list that is array of synonyms, followed by status ('green', |
3698 | + 'yellow' or 'red'). |
3699 | + |
3700 | + """ |
3701 | + |
3702 | + import csv |
3703 | + |
3704 | + equivalents = {} |
3705 | + |
3706 | + with open(equiv_csv, 'rU') as csvfile: |
3707 | + equiv_reader = csv.reader(csvfile, delimiter=',') |
3708 | + equiv_reader.next() # skip header |
3709 | + for row in equiv_reader: |
3710 | + i = 1 |
3711 | + equivalents[row[0]] = [row[1].split(';'),row[2]] |
3712 | + |
3713 | + return equivalents |
3714 | + |
3715 | +def save_taxonomy(taxonomy, output_file): |
3716 | + |
3717 | + import csv |
3718 | + |
3719 | + with open(output_file, 'w') as f: |
3720 | + writer = csv.writer(f) |
3721 | + row = ['OTU'] |
3722 | + row.extend(taxonomy_levels) |
3723 | + row.append('Provider') |
3724 | + writer.writerow(row) |
3725 | + for t in taxonomy: |
3726 | + species = t |
3727 | + row = [] |
3728 | + row.append(t.encode('utf-8')) |
3729 | + for l in taxonomy_levels: |
3730 | + try: |
3731 | + g = taxonomy[t][l] |
3732 | + except KeyError: |
3733 | + g = '-' |
3734 | + row.append(g.encode('utf-8')) |
3735 | + try: |
3736 | + provider = taxonomy[t]['provider'] |
3737 | + except KeyError: |
3738 | + provider = "-" |
3739 | + row.append(provider) |
3740 | + |
3741 | + writer.writerow(row) |
3742 | |
3743 | |
3744 | def load_taxonomy(taxonomy_csv): |
3745 | @@ -2000,20 +2277,443 @@ |
3746 | |
3747 | with open(taxonomy_csv, 'rU') as csvfile: |
3748 | tax_reader = csv.reader(csvfile, delimiter=',') |
3749 | - tax_reader.next() |
3750 | - for row in tax_reader: |
3751 | - current_taxonomy = {} |
3752 | - i = 1 |
3753 | - for t in taxonomy_levels: |
3754 | - if not row[i] == '-': |
3755 | - current_taxonomy[t] = row[i] |
3756 | - i = i+ 1 |
3757 | - |
3758 | - current_taxonomy['provider'] = row[17] # data source |
3759 | - taxonomy[row[0]] = current_taxonomy |
3760 | - |
3761 | - return taxonomy |
3762 | - |
3763 | + try: |
3764 | + j = 0 |
3765 | + for row in tax_reader: |
3766 | + if j == 0: |
3767 | + tax_levels = row[1:-1] |
3768 | + j += 1 |
3769 | + continue |
3770 | + i = 1 |
3771 | + current_taxonomy = {} |
3772 | + for t in tax_levels: |
3773 | + if not row[i] == '-': |
3774 | + current_taxonomy[t] = row[i] |
3775 | + i = i+ 1 |
3776 | + current_taxonomy['provider'] = row[-1] # data source |
3777 | + taxonomy[row[0].replace(" ","_")] = current_taxonomy |
3778 | + j += 1 |
3779 | + except: |
3780 | + pass |
3781 | + |
3782 | + return taxonomy |
3783 | + |
3784 | + |
3785 | +class TaxonomyFetcher(threading.Thread): |
3786 | + """ Class to provide the taxonomy fetching functionality as a threaded function to be used individually or working with a pool. |
3787 | + """ |
3788 | + |
3789 | + def __init__(self, taxonomy, lock, queue, id=0, pref_db=None, verbose=False, ignoreWarnings=False): |
3790 | + """ Constructor for the threaded model. |
3791 | + :param taxonomy: previous taxonomy available (if available) or an empty dictionary to store the results . |
3792 | + :type taxonomy: dictionary |
3793 | + :param lock: lock to keep the taxonomy threadsafe. |
3794 | + :type lock: Lock |
3795 | + :param queue: queue where the taxa are kept to be processed. |
3796 | + :type queue: Queue of strings |
3797 | + :param id: id for the thread to use if messages need to be printed. |
3798 | + :type id: int |
3799 | + :param pref_db: Gives priority to database. Seems it is unused. |
3800 | + :type pref_db: string |
3801 | + :param verbose: Show verbose messages during execution, will also define level of logging. True will set logging level to INFO. |
3802 | + :type verbose: boolean |
3803 | + :param ignoreWarnings: Ignore warnings and errors during execution? Errors will be logged with ERROR level on the logging output. |
3804 | + :type ignoreWarnings: boolean |
3805 | + """ |
3806 | + |
3807 | + threading.Thread.__init__(self) |
3808 | + self.taxonomy = taxonomy |
3809 | + self.lock = lock |
3810 | + self.queue = queue |
3811 | + self.id = id |
3812 | + self.verbose = verbose |
3813 | + self.pref_db = pref_db |
3814 | + self.ignoreWarnings = ignoreWarnings |
3815 | + |
3816 | + def run(self): |
3817 | + """ Gets and processes a taxon from the queue to get its taxonomy.""" |
3818 | + while True : |
3819 | + if self.verbose : |
3820 | + logging.getLogger().setLevel(logging.INFO) |
3821 | + #get taxon from queue |
3822 | + taxon = self.queue.get() |
3823 | + |
3824 | + logging.debug("Starting {} with thread #{} remaining ~{}".format(taxon,str(self.id),str(self.queue.qsize()))) |
3825 | + |
3826 | + #Lock access to the taxonomy |
3827 | + self.lock.acquire() |
3828 | + if not taxon in self.taxonomy: # is a new taxon, not previously in the taxonomy |
3829 | + #Release access to the taxonomy |
3830 | + self.lock.release() |
3831 | + if (self.verbose): |
3832 | + print "Looking up ", taxon |
3833 | + logging.info("Loolking up taxon: {}".format(str(taxon))) |
3834 | + try: |
3835 | + # get the data from EOL on taxon |
3836 | + taxonq = quote_plus(taxon) |
3837 | + URL = "http://eol.org/api/search/1.0.json?q="+taxonq |
3838 | + req = urllib2.Request(URL) |
3839 | + opener = urllib2.build_opener() |
3840 | + f = opener.open(req) |
3841 | + data = json.load(f) |
3842 | + # check if there's some data |
3843 | + if len(data['results']) == 0: |
3844 | + # try PBDB as it might be a fossil |
3845 | + URL = "http://paleobiodb.org/data1.1/taxa/single.json?name="+taxonq+"&show=phylo&vocab=pbdb" |
3846 | + req = urllib2.Request(URL) |
3847 | + opener = urllib2.build_opener() |
3848 | + f = opener.open(req) |
3849 | + datapbdb = json.load(f) |
3850 | + if (len(datapbdb['records']) == 0): |
3851 | + # no idea! |
3852 | + with self.lock: |
3853 | + self.taxonomy[taxon] = {} |
3854 | + self.queue.task_done() |
3855 | + continue |
3856 | + # otherwise, let's fill in info here - only if extinct! |
3857 | + if datapbdb['records'][0]['is_extant'] == 0: |
3858 | + this_taxonomy = {} |
3859 | + this_taxonomy['provider'] = 'PBDB' |
3860 | + for level in taxonomy_levels: |
3861 | + try: |
3862 | + if datapbdb.has_key('records'): |
3863 | + pbdb_lev = datapbdb['records'][0][level] |
3864 | + temp_lev = pbdb_lev.split(" ") |
3865 | + # they might have the author on the end, so strip it off |
3866 | + if (level == 'species'): |
3867 | + this_taxonomy[level] = ' '.join(temp_lev[0:2]) |
3868 | + else: |
3869 | + this_taxonomy[level] = temp_lev[0] |
3870 | + except KeyError as e: |
3871 | + logging.exception("Key not found records") |
3872 | + continue |
3873 | + # add the taxon at right level too |
3874 | + try: |
3875 | + if datapbdb.has_key('records'): |
3876 | + current_level = datapbdb['records'][0]['rank'] |
3877 | + this_taxonomy[current_level] = datapbdb['records'][0]['taxon_name'] |
3878 | + except KeyError as e: |
3879 | + self.queue.task_done() |
3880 | + logging.exception("Key not found records") |
3881 | + continue |
3882 | + with self.lock: |
3883 | + self.taxonomy[taxon] = this_taxonomy |
3884 | + self.queue.task_done() |
3885 | + continue |
3886 | + else: |
3887 | + # extant, but not in EoL - leave the user to sort this one out |
3888 | + with self.lock: |
3889 | + self.taxonomy[taxon] = {} |
3890 | + self.queue.task_done() |
3891 | + continue |
3892 | + |
3893 | + |
3894 | + ID = str(data['results'][0]['id']) # take first hit |
3895 | + # Now look for taxonomies |
3896 | + URL = "http://eol.org/api/pages/1.0/"+ID+".json" |
3897 | + req = urllib2.Request(URL) |
3898 | + opener = urllib2.build_opener() |
3899 | + f = opener.open(req) |
3900 | + data = json.load(f) |
3901 | + if len(data['taxonConcepts']) == 0: |
3902 | + with self.lock: |
3903 | + self.taxonomy[taxon] = {} |
3904 | + self.queue.task_done() |
3905 | + continue |
3906 | + TID = str(data['taxonConcepts'][0]['identifier']) # take first hit |
3907 | + currentdb = str(data['taxonConcepts'][0]['nameAccordingTo']) |
3908 | + # loop through and get preferred one if specified |
3909 | + # now get taxonomy |
3910 | + if (not self.pref_db is None): |
3911 | + for db in data['taxonConcepts']: |
3912 | + currentdb = db['nameAccordingTo'].lower() |
3913 | + if (self.pref_db.lower() in currentdb): |
3914 | + TID = str(db['identifier']) |
3915 | + break |
3916 | + URL="http://eol.org/api/hierarchy_entries/1.0/"+TID+".json" |
3917 | + req = urllib2.Request(URL) |
3918 | + opener = urllib2.build_opener() |
3919 | + f = opener.open(req) |
3920 | + data = json.load(f) |
3921 | + this_taxonomy = {} |
3922 | + this_taxonomy['provider'] = currentdb |
3923 | + for a in data['ancestors']: |
3924 | + try: |
3925 | + if a.has_key('taxonRank') : |
3926 | + temp_level = a['taxonRank'].encode("ascii","ignore") |
3927 | + if (temp_level in taxonomy_levels): |
3928 | + # note the dump into ASCII |
3929 | + temp_name = a['scientificName'].encode("ascii","ignore") |
3930 | + temp_name = temp_name.split(" ") |
3931 | + if (temp_level == 'species'): |
3932 | + this_taxonomy[temp_level] = temp_name[0:2] |
3933 | + |
3934 | + else: |
3935 | + this_taxonomy[temp_level] = temp_name[0] |
3936 | + except KeyError as e: |
3937 | + logging.exception("Key not found: taxonRank") |
3938 | + continue |
3939 | + try: |
3940 | + # add taxonomy in to the taxonomy! |
3941 | + # some issues here, so let's make sure it's OK |
3942 | + temp_name = taxon.split(" ") |
3943 | + if data.has_key('taxonRank') : |
3944 | + if not data['taxonRank'].lower() == 'species': |
3945 | + this_taxonomy[data['taxonRank'].lower()] = temp_name[0] |
3946 | + else: |
3947 | + this_taxonomy[data['taxonRank'].lower()] = ' '.join(temp_name[0:2]) |
3948 | + except KeyError as e: |
3949 | + self.queue.task_done() |
3950 | + logging.exception("Key not found: taxonRank") |
3951 | + continue |
3952 | + with self.lock: |
3953 | + #Send result to dictionary |
3954 | + self.taxonomy[taxon] = this_taxonomy |
3955 | + except urllib2.HTTPError: |
3956 | + print("Network error when processing {} ".format(taxon,)) |
3957 | + logging.info("Network error when processing {} ".format(taxon,)) |
3958 | + self.queue.task_done() |
3959 | + continue |
3960 | + except urllib2.URLError: |
3961 | + print("Network error when processing {} ".format(taxon,)) |
3962 | + logging.info("Network error when processing {} ".format(taxon,)) |
3963 | + self.queue.task_done() |
3964 | + continue |
3965 | + else : |
3966 | + #Nothing to do release the lock on taxonomy |
3967 | + self.lock.release() |
3968 | + #Mark task as done |
3969 | + self.queue.task_done() |
3970 | + |
3971 | +def create_taxonomy_from_taxa(taxa, taxonomy=None, pref_db=None, verbose=False, ignoreWarnings=False, threadNumber=5): |
3972 | + """Uses the taxa provided to generate a taxonomy for all the taxon available. |
3973 | + :param taxa: list of the taxa. |
3974 | + :type taxa : list |
3975 | + :param taxonomy: previous taxonomy available (if available) or an empty |
3976 | + dictionary to store the results. If None will be init to an empty dictionary |
3977 | + :type taxonomy: dictionary |
3978 | + :param pref_db: Gives priority to database. Seems it is unused. |
3979 | + :type pref_db: string |
3980 | + :param verbose: Show verbose messages during execution, will also define |
3981 | + level of logging. True will set logging level to INFO. |
3982 | + :type verbose: boolean |
3983 | + :param ignoreWarnings: Ignore warnings and errors during execution? Errors |
3984 | + will be logged with ERROR level on the logging output. |
3985 | + :type ignoreWarnings: boolean |
3986 | + :param threadNumber: Maximum number of threads to use for taxonomy processing. |
3987 | + :type threadNumber: int |
3988 | + :returns: dictionary with resulting taxonomy for each taxon (keys) |
3989 | + :rtype: dictionary |
3990 | + """ |
3991 | + if verbose : |
3992 | + logging.getLogger().setLevel(logging.INFO) |
3993 | + if taxonomy is None: |
3994 | + taxonomy = {} |
3995 | + |
3996 | + lock = threading.Lock() |
3997 | + queue = Queue.Queue() |
3998 | + |
3999 | + #Starting a few threads as daemons checking the queue |
4000 | + for i in range(threadNumber) : |
4001 | + t = TaxonomyFetcher(taxonomy, lock, queue, i, pref_db, verbose, ignoreWarnings) |
4002 | + t.setDaemon(True) |
4003 | + t.start() |
4004 | + |
4005 | + #Popoluate the queue with the taxa. |
4006 | + for taxon in taxa : |
4007 | + queue.put(taxon) |
4008 | + |
4009 | + #Wait till everyone finishes |
4010 | + queue.join() |
4011 | + logging.getLogger().setLevel(logging.WARNING) |
4012 | + |
4013 | +def create_taxonomy_from_tree(tree, existing_taxonomy=None, pref_db=None, verbose=False, ignoreWarnings=False): |
4014 | + """ Generates the taxonomy from a tree. Uses a similar method to the XML version but works directly on a string with the tree. |
4015 | + :param tree: list of the taxa. |
4016 | + :type tree : list |
4017 | + :param existing_taxonomy: list of the taxa. |
4018 | + :type existing_taxonomy: list |
4019 | + :param pref_db: Gives priority to database. Seems it is unused. |
4020 | + :type pref_db: string |
4021 | + :param verbose: Flag for verbosity. |
4022 | + :type verbose: boolean |
4023 | + :param ignoreWarnings: Flag for exception processing. |
4024 | + :type ignoreWarnings: boolean |
4025 | + :returns: the modified taxonomy |
4026 | + :rtype: dictionary |
4027 | + """ |
4028 | + starttime = time.time() |
4029 | + |
4030 | + if(existing_taxonomy is None) : |
4031 | + taxonomy = {} |
4032 | + else : |
4033 | + taxonomy = existing_taxonomy |
4034 | + |
4035 | + taxa = get_taxa_from_tree_for_taxonomy(tree, pretty=True) |
4036 | + |
4037 | + create_taxonomy_from_taxa(taxa, taxonomy) |
4038 | + |
4039 | + taxonomy = create_extended_taxonomy(taxonomy, starttime, verbose, ignoreWarnings) |
4040 | + |
4041 | + return taxonomy |
4042 | + |
4043 | +def create_taxonomy(XML, existing_taxonomy=None, pref_db=None, verbose=False, ignoreWarnings=False): |
4044 | + """Generates a taxonomy of the data from EoL data. This is stored as a |
4045 | + dictionary of taxonomy for each taxon in the dataset. Missing data are |
4046 | + encoded as '' (blank string). It's up to the calling function to store this |
4047 | + data to file or display it.""" |
4048 | + |
4049 | + starttime = time.time() |
4050 | + |
4051 | + if not ignoreWarnings: |
4052 | + _check_data(XML) |
4053 | + |
4054 | + if (existing_taxonomy is None): |
4055 | + taxonomy = {} |
4056 | + else: |
4057 | + taxonomy = existing_taxonomy |
4058 | + taxa = get_all_taxa(XML, pretty=True) |
4059 | + create_taxonomy_from_taxa(taxa, taxonomy) |
4060 | + #taxonomy = create_extended_taxonomy(taxonomy, starttime, verbose, ignoreWarnings) |
4061 | + return taxonomy |
4062 | + |
4063 | +def create_extended_taxonomy(taxonomy, starttime, verbose=False, ignoreWarnings=False): |
4064 | + """Bring extra taxonomy terms from other databases, shared method for completing the taxonomy |
4065 | + both for trees comming from XML or directly from trees. |
4066 | + :param taxonomy: Dictionary with the relationship for taxa and taxonomy terms. |
4067 | + :type taxonomy: dictionary |
4068 | + :param starttime: time to keep track of processing time. |
4069 | + :type starttime: long |
4070 | + :param verbose: Flag for verbosity. |
4071 | + :type verbose: boolean |
4072 | + :param ignoreWarnings: Flag for exception processing. |
4073 | + :type ignoreWarnings: boolean |
4074 | + :returns: the modified taxonomy |
4075 | + :rtype: dictionary |
4076 | + """ |
4077 | + |
4078 | + if (verbose): |
4079 | + logging.info('Done basic taxonomy, getting more info from ITIS') |
4080 | + print("Time elapsed {}".format(str(time.time() - starttime))) |
4081 | + print "Done basic taxonomy, getting more info from ITIS" |
4082 | + # fill in the rest of the taxonomy |
4083 | + # get all genera |
4084 | + genera = [] |
4085 | + for t in taxonomy: |
4086 | + if t in taxonomy: |
4087 | + if GENUS in taxonomy[t]: |
4088 | + genera.append(taxonomy[t][GENUS]) |
4089 | + genera = _uniquify(genera) |
4090 | + # We then use ITIS to fill in missing info based on the genera only - that saves us a species level search |
4091 | + # and we can fill in most of the EoL missing data |
4092 | + for g in genera: |
4093 | + if (verbose): |
4094 | + print "Looking up ", g |
4095 | + logging.info("Looking up {}".format(str(g))) |
4096 | + try: |
4097 | + URL="http://www.itis.gov/ITISWebService/jsonservice/searchByScientificName?srchKey="+quote_plus(g.strip()) |
4098 | + except: |
4099 | + continue |
4100 | + req = urllib2.Request(URL) |
4101 | + opener = urllib2.build_opener() |
4102 | + try: |
4103 | + f = opener.open(req) |
4104 | + except urllib2.HTTPError: |
4105 | + continue |
4106 | + string = unicode(f.read(),"ISO-8859-1") |
4107 | + data = json.loads(string) |
4108 | + if data['scientificNames'][0] == None: |
4109 | + continue |
4110 | + tsn = data["scientificNames"][0]["tsn"] |
4111 | + URL="http://www.itis.gov/ITISWebService/jsonservice/getFullHierarchyFromTSN?tsn="+str(tsn) |
4112 | + req = urllib2.Request(URL) |
4113 | + opener = urllib2.build_opener() |
4114 | + f = opener.open(req) |
4115 | + try: |
4116 | + string = unicode(f.read(),"ISO-8859-1") |
4117 | + except: |
4118 | + continue |
4119 | + data = json.loads(string) |
4120 | + this_taxonomy = {} |
4121 | + for level in data['hierarchyList']: |
4122 | + if not level['rankName'].lower() in current_taxonomy_levels: |
4123 | + # note the dump into ASCII |
4124 | + if level['rankName'].lower() == 'species': |
4125 | + this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = ' '.join.level['taxonName'][0:2].encode("ascii","ignore") |
4126 | + else: |
4127 | + this_taxonomy[level['rankName'].lower().encode("ascii","ignore")] = level['taxonName'].encode("ascii","ignore") |
4128 | + |
4129 | + for t in taxonomy: |
4130 | + if t in taxonomy: |
4131 | + if GENUS in taxonomy[t]: |
4132 | + if taxonomy[t][GENUS] == g: |
4133 | + taxonomy[t].update(this_taxonomy) |
4134 | + |
4135 | + return taxonomy |
4136 | + |
4137 | +def generate_species_level_data(XML, taxonomy, ignoreWarnings=False, verbose=False): |
4138 | + """ Based on a taxonomy data set, amend the data to be at species level as |
4139 | + far as possible. This function creates an internal 'subs file' and calls |
4140 | + the standard substitution functions. The internal subs are generated by |
4141 | + looping over the taxa and if not at species-level, working out which level |
4142 | + they are at and then adding species already in the dataset to replace it |
4143 | + via a polytomy. This has to be done in one step to avoid adding spurious |
4144 | + structure to the phylogenies """ |
4145 | + |
4146 | + if not ignoreWarnings: |
4147 | + _check_data(XML) |
4148 | + |
4149 | + # if taxonomic checker not done, warn |
4150 | + if (not taxonomy): |
4151 | + raise NoneCompleteTaxonomy("Taxonomy is empty. Create a taxonomy first. You'll probably need to hand edit the file to complete") |
4152 | + return |
4153 | + |
4154 | + # if missing data in taxonomy, warn |
4155 | + taxa = get_all_taxa(XML) |
4156 | + keys = taxonomy.keys() |
4157 | + if (not ignoreWarnings): |
4158 | + for t in taxa: |
4159 | + t = t.replace("_"," ") |
4160 | + if not t in keys: |
4161 | + # This idea here is that the caller will catch this, then re-run with ignoreWarnings set to True |
4162 | + raise NoneCompleteTaxonomy("Taxonomy is not complete. I will soldier on anyway, but this might not work as intended") |
4163 | + |
4164 | + # get all taxa - see above! |
4165 | + # for each taxa, if not at species level |
4166 | + new_taxa = [] |
4167 | + old_taxa = [] |
4168 | + for t in taxa: |
4169 | + subs = [] |
4170 | + t = t.replace("_"," ") |
4171 | + if (not SPECIES in taxonomy[t]): # the current taxon is not a species, but higher level taxon |
4172 | + # work out which level - should we encode this in the data to start with? |
4173 | + for tl in taxonomy_levels: |
4174 | + try: |
4175 | + tax_data = taxonomy[t][tl] |
4176 | + except KeyError: |
4177 | + continue |
4178 | + if (t == taxonomy[t][tl]): |
4179 | + current_level = tl |
4180 | + # find all species in the taxonomy that match this level |
4181 | + for taxon in taxa: |
4182 | + taxon = taxon.replace("_"," ") |
4183 | + if (SPECIES in taxonomy[taxon]): |
4184 | + try: |
4185 | + if taxonomy[taxon][current_level] == t: # our current taxon |
4186 | + subs.append(taxon.replace(" ","_")) |
4187 | + except KeyError: |
4188 | + continue |
4189 | + |
4190 | + # create the sub |
4191 | + if len(subs) > 0: |
4192 | + old_taxa.append(t.replace(" ","_")) |
4193 | + new_taxa.append(','.join(subs)) |
4194 | + |
4195 | + # call the sub |
4196 | + new_XML = substitute_taxa(XML, old_taxa, new_taxa, verbose=verbose) |
4197 | + new_XML = clean_data(new_XML) |
4198 | + |
4199 | + return new_XML |
4200 | |
4201 | def data_overlap(XML, overlap_amount=2, filename=None, detailed=False, show=False, verbose=False, ignoreWarnings=False): |
4202 | """ Calculate the amount of taxonomic overlap between source trees. |
4203 | @@ -2024,7 +2724,7 @@ |
4204 | If filename is None, no graphic is generated. Otherwise a simple |
4205 | graphic is generated showing the number of cluster. If detailed is set to |
4206 | true, a graphic is generated showing *all* trees. For data containing >200 |
4207 | - source tres this could be very big and take along time. More likely, you'll run |
4208 | + source trees this could be very big and take along time. More likely, you'll run |
4209 | out of memory. |
4210 | """ |
4211 | import matplotlib |
4212 | @@ -2103,6 +2803,7 @@ |
4213 | sufficient_overlap = True |
4214 | |
4215 | # The above list actually contains which components are seperate from each other |
4216 | + key_list = connected_components |
4217 | |
4218 | if (not filename == None or show): |
4219 | if (verbose): |
4220 | @@ -2266,7 +2967,9 @@ |
4221 | prev_char = None |
4222 | prev_taxa = None |
4223 | prev_name = None |
4224 | - non_ind = {} |
4225 | + subsets = [] |
4226 | + identical = [] |
4227 | + is_identical = False |
4228 | for data in data_ind: |
4229 | name = data[0] |
4230 | char = data[1] |
4231 | @@ -2275,22 +2978,71 @@ |
4232 | # when sorted, the longer list comes first |
4233 | if set(taxa).issubset(set(prev_taxa)): |
4234 | if (taxa == prev_taxa): |
4235 | - non_ind[name] = [prev_name,IDENTICAL] |
4236 | + if (is_identical): |
4237 | + identical[-1].append(name) |
4238 | + else: |
4239 | + identical.append([name,prev_name]) |
4240 | + is_identical = True |
4241 | + |
4242 | else: |
4243 | - non_ind[name] = [prev_name,SUBSET] |
4244 | + subsets.append([prev_name, name]) |
4245 | + prev_name = name |
4246 | + is_identical = False |
4247 | + else: |
4248 | + prev_name = name |
4249 | + is_identical = False |
4250 | + else: |
4251 | + prev_name = name |
4252 | + is_identical = False |
4253 | + |
4254 | prev_char = char |
4255 | prev_taxa = taxa |
4256 | - prev_name = name |
4257 | - |
4258 | + |
4259 | if (make_new_xml): |
4260 | new_xml = XML |
4261 | - for name in non_ind: |
4262 | - if (non_ind[name][1] == SUBSET): |
4263 | - new_xml = _swap_tree_in_XML(new_xml,None,name) |
4264 | + # deal with subsets |
4265 | + for s in subsets: |
4266 | + new_xml = _swap_tree_in_XML(new_xml,None,s[1]) |
4267 | new_xml = clean_data(new_xml) |
4268 | - return non_ind, new_xml |
4269 | + # deal with identical - weight them, if there's 3, weights are 0.3, i.e. |
4270 | + # weights are 1/no of identical trees |
4271 | + for i in identical: |
4272 | + weight = 1.0 / float(len(i)) |
4273 | + new_xml = add_weights(new_xml, i, weight) |
4274 | + |
4275 | + return identical, subsets, new_xml |
4276 | else: |
4277 | - return non_ind |
4278 | + return identical, subsets |
4279 | + |
4280 | + |
4281 | +def add_weights(XML, names, weight): |
4282 | + """ Add weights for tree, supply array of names and a weight, they get set |
4283 | + Returns a new XML |
4284 | + """ |
4285 | + |
4286 | + xml_root = _parse_xml(XML) |
4287 | + # By getting source, we can then loop over each source_tree |
4288 | + find = etree.XPath("//source_tree") |
4289 | + sources = find(xml_root) |
4290 | + for s in sources: |
4291 | + s_name = s.attrib['name'] |
4292 | + for n in names: |
4293 | + if s_name == n: |
4294 | + if s.xpath("tree/weight/real_value") == []: |
4295 | + # add weights |
4296 | + weights_element = etree.Element("weight") |
4297 | + weights_element.tail="\n" |
4298 | + real_value = etree.SubElement(weights_element,'real_value') |
4299 | + real_value.attrib['rank'] = '0' |
4300 | + real_value.tail = '\n' |
4301 | + real_value.text = str(weight) |
4302 | + t = s.xpath("tree")[0] |
4303 | + t.append(weights_element) |
4304 | + else: |
4305 | + s.xpath("tree/weight/real_value")[0].text = str(weight) |
4306 | + |
4307 | + return etree.tostring(xml_root,pretty_print=True) |
4308 | + |
4309 | |
4310 | def add_historical_event(XML, event_description): |
4311 | """ |
4312 | @@ -2380,8 +3132,15 @@ |
4313 | # check trees are informative |
4314 | XML = _check_informative_trees(XML,delete=True) |
4315 | |
4316 | + |
4317 | # check sources |
4318 | XML = _check_sources(XML,delete=True) |
4319 | + XML = all_sourcenames(XML) |
4320 | + |
4321 | + # fix tree names |
4322 | + XML = set_unique_names(XML) |
4323 | + XML = set_all_tree_names(XML,overwrite=True) |
4324 | + |
4325 | |
4326 | # unpermutable trees |
4327 | permutable_trees = _find_trees_for_permuting(XML) |
4328 | @@ -2659,7 +3418,7 @@ |
4329 | s.getparent().remove(s) |
4330 | |
4331 | # edit name (append _subset) |
4332 | - proj_name = xml_root.xpath('/phylo_storage/project_name/string_value')[0].text |
4333 | + proj_name = get_project_name(XML) |
4334 | proj_name += "_subset" |
4335 | xml_root.xpath('/phylo_storage/project_name/string_value')[0].text = proj_name |
4336 | |
4337 | @@ -2928,6 +3687,37 @@ |
4338 | |
4339 | return mrca |
4340 | |
4341 | + |
4342 | +def tree_from_taxonomy(taxonomy, end_level, end_rank): |
4343 | + """Create a tree from a taxonomy data structure. |
4344 | + This is not the most efficient way, but works OK |
4345 | + """ |
4346 | + |
4347 | + # Grab data only for the end_level classification |
4348 | + required_taxonomy = {} |
4349 | + for t in taxonomy: |
4350 | + if (end_level in t): |
4351 | + required_taxonomy[t] = taxonomy[t] |
4352 | + |
4353 | + rank_index = taxonomy_levels.index(end_rank) |
4354 | + |
4355 | + # create basic string |
4356 | + |
4357 | + # get unique otus |
4358 | + |
4359 | + # sort by the subfamily |
4360 | + |
4361 | + # for each genus create a newick string |
4362 | + |
4363 | + # if it's the same grouping as previous, add as sister clade (i.e. ,) |
4364 | + # else, prepend a (, append a ) and add new clade (ie. ,) |
4365 | + |
4366 | + |
4367 | + # return tree |
4368 | + |
4369 | + |
4370 | + |
4371 | + |
4372 | ################ PRIVATE FUNCTIONS ######################## |
4373 | |
4374 | def _uniquify(l): |
4375 | @@ -2975,13 +3765,25 @@ |
4376 | "The source names in the dataset are not unique. Please run the auto-name function on these data. Name: "+name+"\n" |
4377 | last_name = name |
4378 | |
4379 | + # do same for tree names: |
4380 | + names = get_all_tree_names(XML) |
4381 | + names.sort() |
4382 | + last_name = "" # This will actually throw an non-unique error if a name is empty |
4383 | + # not great, but still an error! |
4384 | + for name in names: |
4385 | + if name == last_name: |
4386 | + # if non-unique throw exception |
4387 | + message = message + \ |
4388 | + "The tree names in the dataset are not unique. Please run the auto-name function on these data with replace or edit by hand. Name: "+name+"\n" |
4389 | + last_name = name |
4390 | + |
4391 | if (not message == ""): |
4392 | raise NotUniqueError(message) |
4393 | |
4394 | return |
4395 | |
4396 | |
4397 | -def _assemble_tree_matrix(tree_string): |
4398 | +def _assemble_tree_matrix(tree_string, verbose=False): |
4399 | """ Assembles the MRP matrix for an individual tree |
4400 | |
4401 | returns: matrix (2D numpy array: taxa on i, nodes on j) |
4402 | @@ -3009,7 +3811,7 @@ |
4403 | for i in range(0,len(names)): |
4404 | adjmat.append([1]) |
4405 | adjmat = numpy.array(adjmat) |
4406 | - |
4407 | + if verbose: |
4408 | print "Warning: Found uninformative tree in data. Including it in the matrix anyway" |
4409 | |
4410 | return adjmat, names |
4411 | @@ -3020,7 +3822,7 @@ |
4412 | |
4413 | If the new_taxa array is missing, simply delete the old_taxa |
4414 | """ |
4415 | - |
4416 | + |
4417 | tree = _correctly_quote_taxa(tree) |
4418 | # are the input values lists or simple strings? |
4419 | if (isinstance(old_taxa,str)): |
4420 | @@ -3564,7 +4366,7 @@ |
4421 | |
4422 | return permute_trees |
4423 | |
4424 | -def _create_matrix(trees, taxa, format="hennig", quote=False, weights=None): |
4425 | +def _create_matrix(trees, taxa, format="hennig", quote=False, weights=None, verbose=False): |
4426 | """ |
4427 | Does the hard work on creating a matrix |
4428 | """ |
4429 | @@ -3585,7 +4387,7 @@ |
4430 | if (not weights == None): |
4431 | weight = weights[key] |
4432 | names.append(key) |
4433 | - submatrix, tree_taxa = _assemble_tree_matrix(trees[key]) |
4434 | + submatrix, tree_taxa = _assemble_tree_matrix(trees[key], verbose=verbose) |
4435 | nChars = len(submatrix[0,:]) |
4436 | # loop over characters in the submatrix |
4437 | for i in range(1,nChars): |
4438 | @@ -3637,7 +4439,7 @@ |
4439 | matrix_string += string + "\n" |
4440 | i += 1 |
4441 | |
4442 | - matrix_string += "\t;\n" |
4443 | + matrix_string += "\n" |
4444 | if (not weights == None): |
4445 | # get unique weights |
4446 | unique_weights = _uniquify(weights) |
4447 | @@ -3652,7 +4454,7 @@ |
4448 | matrix_string += " " + str(i) |
4449 | i += 1 |
4450 | matrix_string += ";\n" |
4451 | - matrix_string += "procedure /;" |
4452 | + matrix_string += "proc /;" |
4453 | elif (format == 'nexus'): |
4454 | matrix_string = "#nexus\n\nbegin data;\n" |
4455 | matrix_string += "\tdimensions ntax = "+str(len(taxa)) +" nchar = "+str(last_char)+";\n" |
4456 | |
4457 | === modified file 'stk/test/_substitute_taxa.py' |
4458 | --- stk/test/_substitute_taxa.py 2016-07-14 10:12:17 +0000 |
4459 | +++ stk/test/_substitute_taxa.py 2017-01-12 09:27:31 +0000 |
4460 | @@ -10,6 +10,7 @@ |
4461 | from stk.supertree_toolkit import check_subs, _tree_contains, _correctly_quote_taxa, _remove_single_poly_taxa |
4462 | from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_all_taxa, _parse_tree, _delete_taxon |
4463 | from stk.supertree_toolkit import _collapse_nodes, import_tree, subs_from_csv, _getTaxaFromNewick, obtain_trees |
4464 | +from stk.supertree_toolkit import generate_species_level_data |
4465 | from lxml import etree |
4466 | from util import * |
4467 | from stk.stk_exceptions import * |
4468 | @@ -776,7 +777,24 @@ |
4469 | new_tree = _sub_taxa_in_tree(tree2,"Thereuopodina",sub_in,skip_existing=True); |
4470 | self.assert_(answer2, new_tree) |
4471 | |
4472 | - |
4473 | + |
4474 | + def test_auto_subs_taxonomy(self): |
4475 | + """test the automatic subs function with a simple test""" |
4476 | + XML = etree.tostring(etree.parse('data/input/auto_sub.phyml',parser),pretty_print=True) |
4477 | + taxonomy = {'Ardea goliath': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Ecdysozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Ardea', 'order': 'Pelecaniformes', 'species': 'Ardea goliath'}, |
4478 | + 'Pelecaniformes': {'kingdom': 'Animalia', 'phylum': 'Chordata', 'order': 'Pelecaniformes', 'class': 'Aves', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013'}, 'Gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'order': 'Galliformes'}, |
4479 | + 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'infraphylum': 'Gnathostomata', 'superclass': 'Tetrapoda', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Thalassarche', 'order': 'Procellariiformes', 'species': 'Thalassarche melanophris'}, |
4480 | + 'Platalea leucorodia': {'kingdom': 'Animalia', 'subfamily': 'Plataleinae', 'family': 'Threskiornithidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'infraphylum': 'Gnathostomata', 'superclass': 'Tetrapoda', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Platalea', 'order': 'Pelecaniformes', 'species': 'Platalea leucorodia'}, |
4481 | + 'Gallus lafayetii': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'order': 'Galliformes', 'species': 'Gallus lafayetii'}, |
4482 | + 'Ardea humbloti': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Ecdysozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Ardea', 'order': 'Pelecaniformes', 'species': 'Ardea humbloti'}, |
4483 | + 'Gallus varius': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'order': 'Galliformes', 'species': 'Gallus varius'}} |
4484 | + XML = generate_species_level_data(XML, taxonomy) |
4485 | + expected_XML = etree.tostring(etree.parse('data/output/one_click_subs_output.phyml',parser),pretty_print=True) |
4486 | + trees = obtain_trees(XML) |
4487 | + expected_trees = obtain_trees(expected_XML) |
4488 | + for t in trees: |
4489 | + self.assert_(_trees_equal(trees[t], expected_trees[t])) |
4490 | + |
4491 | def test_parrot_edge_case(self): |
4492 | """Random edge case where the tree dissappeared...""" |
4493 | trees = ["(((((((Agapornis_lilianae, Agapornis_nigrigenis), Agapornis_personata, Agapornis_fischeri), Agapornis_roseicollis), (Agapornis_pullaria, Agapornis_taranta)), Agapornis_cana), Loriculus_galgulus), Geopsittacus_occidentalis);"] |
4494 | |
4495 | === modified file 'stk/test/_supertree_toolkit.py' |
4496 | --- stk/test/_supertree_toolkit.py 2015-03-26 09:58:58 +0000 |
4497 | +++ stk/test/_supertree_toolkit.py 2017-01-12 09:27:31 +0000 |
4498 | @@ -7,12 +7,13 @@ |
4499 | import os |
4500 | stk_path = os.path.join( os.path.realpath(os.path.dirname(__file__)), os.pardir, os.pardir ) |
4501 | sys.path.insert(0, stk_path) |
4502 | -from stk.supertree_toolkit import _check_uniqueness, _check_taxa, _check_data, get_all_characters, data_independence |
4503 | +from stk.supertree_toolkit import _check_uniqueness, _check_taxa, _check_data, get_all_characters, data_independence, add_weights |
4504 | from stk.supertree_toolkit import get_fossil_taxa, get_publication_years, data_summary, get_character_numbers, get_analyses_used |
4505 | from stk.supertree_toolkit import data_overlap, read_matrix, subs_file_from_str, clean_data, obtain_trees, get_all_source_names |
4506 | from stk.supertree_toolkit import add_historical_event, _sort_data, _parse_xml, _check_sources, _swap_tree_in_XML, replace_genera |
4507 | from stk.supertree_toolkit import get_all_taxa, _get_all_siblings, _parse_tree, get_characters_used, _trees_equal, get_weights |
4508 | -from stk.supertree_toolkit import get_outgroup, set_all_tree_names, create_tree_name, load_taxonomy |
4509 | +from stk.supertree_toolkit import get_outgroup, set_all_tree_names, create_tree_name, taxonomic_checker, load_taxonomy, load_equivalents |
4510 | +from stk.supertree_toolkit import create_taxonomy, create_taxonomy_from_tree, get_all_tree_names |
4511 | from lxml import etree |
4512 | from util import * |
4513 | from stk.stk_exceptions import * |
4514 | @@ -268,19 +269,52 @@ |
4515 | |
4516 | def test_data_independence(self): |
4517 | XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True) |
4518 | - expected_dict = {'Hill_2011_2': ['Hill_2011_1', 1], 'Hill_Davis_2011_1': ['Hill_Davis_2011_2', 0]} |
4519 | - non_ind = data_independence(XML) |
4520 | - self.assertDictEqual(expected_dict, non_ind) |
4521 | + expected_idents = [['Hill_Davis_2011_2', 'Hill_Davis_2011_1', 'Hill_Davis_2011_3'], ['Hill_Davis_2013_1', 'Hill_Davis_2013_2']] |
4522 | + non_ind,subsets = data_independence(XML) |
4523 | + expected_subsets = [['Hill_2011_1', 'Hill_2011_2']] |
4524 | + self.assertListEqual(expected_subsets, subsets) |
4525 | + self.assertListEqual(expected_idents, non_ind) |
4526 | |
4527 | - def test_data_independence(self): |
4528 | + def test_data_independence_2(self): |
4529 | XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True) |
4530 | - expected_dict = {'Hill_2011_2': ['Hill_2011_1', 1], 'Hill_Davis_2011_1': ['Hill_Davis_2011_2', 0]} |
4531 | - non_ind, new_xml = data_independence(XML,make_new_xml=True) |
4532 | - self.assertDictEqual(expected_dict, non_ind) |
4533 | + expected_idents = [['Hill_Davis_2011_2', 'Hill_Davis_2011_1', 'Hill_Davis_2011_3'], ['Hill_Davis_2013_1', 'Hill_Davis_2013_2']] |
4534 | + expected_subsets = [['Hill_2011_1', 'Hill_2011_2']] |
4535 | + non_ind, subset, new_xml = data_independence(XML,make_new_xml=True) |
4536 | + self.assertListEqual(expected_idents, non_ind) |
4537 | + self.assertListEqual(expected_subsets, subset) |
4538 | # check the second tree has not been removed |
4539 | self.assertRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,H:1.00000)0.00000:0.00000)0.00000:0.00000;')) |
4540 | # check that the first tree is removed |
4541 | self.assertNotRegexpMatches(new_xml,re.escape('((A:1.00000,B:1.00000)0.00000:0.00000,(F:1.00000,E:1.00000)0.00000:0.00000)0.00000:0.00000;')) |
4542 | + |
4543 | + def test_add_weights(self): |
4544 | + """Add weights to a bunch of trees""" |
4545 | + XML = etree.tostring(etree.parse('data/input/check_data_ind.phyml',parser),pretty_print=True) |
4546 | + # see above |
4547 | + expected_idents = [['Hill_Davis_2011_2', 'Hill_Davis_2011_1', 'Hill_Davis_2011_3'], ['Hill_Davis_2013_1', 'Hill_Davis_2013_2']] |
4548 | + # so the first should end up with a weight of 0.33333 and the second with 0.5 |
4549 | + for ei in expected_idents: |
4550 | + weight = 1.0/float(len(ei)) |
4551 | + XML = add_weights(XML, ei, weight) |
4552 | + |
4553 | + expected_weights = [str(1.0/3.0), str(1.0/3.0), str(1.0/3.0), str(0.5), str(0.5)] |
4554 | + weights_in_xml = [] |
4555 | + # now check weights have been added to the correct part of the tree |
4556 | + xml_root = _parse_xml(XML) |
4557 | + i = 0 |
4558 | + for ei in expected_idents: |
4559 | + for tree in ei: |
4560 | + find = etree.XPath("//source_tree") |
4561 | + trees = find(xml_root) |
4562 | + for t in trees: |
4563 | + if t.attrib['name'] == tree: |
4564 | + # check len(trees) == 0 |
4565 | + weights_in_xml.append(t.xpath("tree/weight/real_value")[0].text) |
4566 | + |
4567 | + self.assertListEqual(expected_weights,weights_in_xml) |
4568 | + |
4569 | + |
4570 | + |
4571 | |
4572 | def test_overlap(self): |
4573 | XML = etree.tostring(etree.parse('data/input/check_overlap_ok.phyml',parser),pretty_print=True) |
4574 | @@ -438,7 +472,7 @@ |
4575 | XML = clean_data(XML) |
4576 | trees = obtain_trees(XML) |
4577 | self.assert_(len(trees) == 2) |
4578 | - expected_trees = {'Hill_2011_4': '(A,B,(C,D,E));', 'Hill_2011_2': '(A, B, C, (D, E, F));'} |
4579 | + expected_trees = {'Hill_2011_2': '(A,B,(C,D,E));', 'Hill_2011_1': '(A, B, C, (D, E, F));'} |
4580 | for t in trees: |
4581 | self.assert_(_trees_equal(trees[t],expected_trees[t])) |
4582 | |
4583 | @@ -558,18 +592,78 @@ |
4584 | self.assert_(c in expected_characters) |
4585 | self.assert_(len(characters) == len(expected_characters)) |
4586 | |
4587 | + def test_create_taxonomy(self): |
4588 | + XML = etree.tostring(etree.parse('data/input/create_taxonomy.phyml',parser),pretty_print=True) |
4589 | + # Tested on 11/01/17 and EOL have changed the output |
4590 | + # old_expected = {'Archaeopteryx lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'superclass': 'Sarcopterygii', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Egretta tricolor', 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Jeletzkytes criptonodosus': {'superfamily': 'Scaphitoidea', 'family': 'Scaphitidae', 'subkingdom': 'Metazoa', 'subclass': 'Ammonoidea', 'species': 'Jeletzkytes criptonodosus', 'phylum': 'Mollusca', 'suborder': 'Ancyloceratina', 'provider': 'Paleobiology Database', 'genus': 'Jeletzkytes', 'class': 'Cephalopoda'}} |
4591 | + expected = {'Jeletzkytes criptonodosus': {'superfamily': 'Scaphitoidea', 'family': 'Scaphitidae', 'subkingdom': 'Metazoa', 'subclass': 'Ammonoidea', 'species': 'Jeletzkytes criptonodosus', 'phylum': 'Mollusca', 'suborder': 'Ancyloceratina', 'provider': 'Paleobiology Database', 'genus': 'Jeletzkytes', 'class': 'Cephalopoda'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'infraspecies': 'Egretta', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': ['Egretta', 'tricolor'], 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Archaeopteryx lithographica': {'genus': 'Archaeopteryx', 'provider': 'Paleobiology Database'}} |
4592 | + if (internet_on()): |
4593 | + taxonomy = create_taxonomy(XML) |
4594 | + self.maxDiff = None |
4595 | + self.assertDictEqual(taxonomy, expected) |
4596 | + else: |
4597 | + print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the taxonomy_checker function" |
4598 | + return |
4599 | + |
4600 | + def test_create_taxonomy_from_tree(self): |
4601 | + """Tests if taxonomy from tree works. Uses same data for normal XML test but goes directly for the tree instead of parsing the XML """ |
4602 | + # Tested on 11/01/17 and this no longer worked, but is correct! EOL returned something different. |
4603 | + #old_expected = {'Archaeopteryx lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'superclass': 'Sarcopterygii', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Egretta tricolor', 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}} |
4604 | + expected = {'Archaeopteryx lithographica': {'genus': 'Archaeopteryx', 'provider': 'Paleobiology Database'}, 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'infraspecies': 'Egretta', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': ['Egretta', 'tricolor'], 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus gallus': {'kingdom': 'Animalia', 'family': 'Phasianidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'genus': 'Gallus', 'order': 'Galliformes'}, 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Thalassarche melanophris', 'genus': 'Thalassarche', 'order': 'Procellariiformes'}} |
4605 | + tree = "(Archaeopteryx_lithographica, (Gallus_gallus, (Thalassarche_melanophris, Egretta_tricolor)));" |
4606 | + if (internet_on()): |
4607 | + taxonomy = create_taxonomy_from_tree(tree) |
4608 | + self.maxDiff = None |
4609 | + self.assertDictEqual(taxonomy, expected) |
4610 | + else: |
4611 | + print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the create_taxonomy function" |
4612 | + return |
4613 | + |
4614 | + def test_taxonomy_checker(self): |
4615 | + expected = {'Thalassarche_melanophrys': [['Thalassarche_melanophris', 'Thalassarche_melanophrys', 'Diomedea_melanophris', 'Thalassarche_[melanophrys', 'Diomedea_melanophrys'], 'amber'], 'Egretta_tricolor': [['Egretta_tricolor'], 'green'], 'Gallus_gallus': [['Gallus_gallus'], 'green']} |
4616 | + XML = etree.tostring(etree.parse('data/input/check_taxonomy.phyml',parser),pretty_print=True) |
4617 | + if (internet_on()): |
4618 | + equivs = taxonomic_checker(XML) |
4619 | + self.maxDiff = None |
4620 | + self.assertDictEqual(equivs, expected) |
4621 | + else: |
4622 | + print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the taxonomy_checker function" |
4623 | + return |
4624 | + |
4625 | + def test_taxonomy_checker2(self): |
4626 | + XML = etree.tostring(etree.parse('data/input/check_taxonomy_fixes.phyml',parser),pretty_print=True) |
4627 | + if (internet_on()): |
4628 | + # This test is a bit dodgy as it depends on EOL's server speed. Run it a few times before deciding it's broken. |
4629 | + equivs = taxonomic_checker(XML,verbose=False) |
4630 | + self.maxDiff = None |
4631 | + self.assert_(equivs['Agathamera_crassa'][0][0] == 'Agathemera_crassa') |
4632 | + self.assert_(equivs['Celatoblatta_brunni'][0][0] == 'Maoriblatta_brunni') |
4633 | + self.assert_(equivs['Blatta_lateralis'][1] == 'amber') |
4634 | + else: |
4635 | + print bcolors.WARNING + "WARNING: "+ bcolors.ENDC+ "No internet connection found. Not checking the taxonomy_checker function" |
4636 | + return |
4637 | + |
4638 | + |
4639 | def test_load_taxonomy(self): |
4640 | csv_file = "data/input/create_taxonomy.csv" |
4641 | - expected = {'Archaeopteryx lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, |
4642 | - 'Egretta tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'subkingdom': 'Bilateria', 'subclass': 'Neoloricata', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'suborder': 'Ischnochitonina', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Protostomia', 'genus': 'Egretta', 'order': 'Pelecaniformes', 'species': 'Egretta tricolor'}, |
4643 | - 'Gallus gallus': {'kingdom': 'Animalia', 'infrakingdom': 'Protostomia', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'genus': 'Gallus', 'order': 'Galliformes', 'species': 'Gallus gallus'}, |
4644 | - 'Thalassarche melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'subkingdom': 'Bilateria', 'class': 'Aves', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Thalassarche', 'order': 'Procellariiformes', 'species': 'Thalassarche melanophris'}, |
4645 | - 'Jeletzkytes criptonodosus': {'kingdom': 'Metazoa', 'family': 'Scaphitidae', 'order': 'Ammonoidea', 'phylum': 'Mollusca', 'provider': 'PBDB', 'species': 'Jeletzkytes criptonodosus', 'class': 'Cephalopoda'}} |
4646 | + expected = {'Jeletzkytes_criptonodosus': {'kingdom': 'Metazoa', 'subclass': 'Cephalopoda', 'species': 'Jeletzkytes criptonodosus', 'suborder': 'Ammonoidea', 'provider': 'PBDB', 'subfamily': 'Scaphitidae', 'class': 'Mollusca'}, 'Archaeopteryx_lithographica': {'subkingdom': 'Metazoa', 'subclass': 'Tetrapodomorpha', 'suborder': 'Coelurosauria', 'provider': 'Paleobiology Database', 'genus': 'Archaeopteryx', 'class': 'Aves'}, 'Egretta_tricolor': {'kingdom': 'Animalia', 'family': 'Ardeidae', 'class': 'Aves', 'subkingdom': 'Bilateria', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'subclass': 'Neoloricata', 'species': 'Egretta tricolor', 'phylum': 'Chordata', 'suborder': 'Ischnochitonina', 'superphylum': 'Lophozoa', 'infrakingdom': 'Protostomia', 'genus': 'Egretta', 'order': 'Pelecaniformes'}, 'Gallus_gallus': {'kingdom': 'Animalia', 'superorder': 'Galliformes', 'family': 'Phasianidae', 'subkingdom': 'Bilateria', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'species': 'Gallus gallus', 'phylum': 'Chordata', 'superphylum': 'Lophozoa', 'infrakingdom': 'Protostomia', 'genus': 'Gallus', 'class': 'Aves'}, 'Thalassarche_melanophris': {'kingdom': 'Animalia', 'family': 'Diomedeidae', 'subkingdom': 'Bilateria', 'species': 'Thalassarche melanophris', 'order': 'Procellariiformes', 'phylum': 'Chordata', 'provider': 'Species 2000 & ITIS Catalogue of Life: April 2013', 'infrakingdom': 'Deuterostomia', 'subphylum': 'Vertebrata', 'genus': 'Thalassarche', 'class': 'Aves'}} |
4647 | taxonomy = load_taxonomy(csv_file) |
4648 | self.maxDiff = None |
4649 | |
4650 | self.assertDictEqual(taxonomy, expected) |
4651 | |
4652 | + |
4653 | + def test_load_equivalents(self): |
4654 | + csv_file = "data/input/equivalents.csv" |
4655 | + expected = {'Turnix_sylvatica': [['Turnix_sylvaticus','Tetrao_sylvaticus','Tetrao_sylvatica','Turnix_sylvatica'],'yellow'], |
4656 | + 'Xiphorhynchus_pardalotus':[['Xiphorhynchus_pardalotus'],'green'], |
4657 | + 'Phaenicophaeus_curvirostris':[['Zanclostomus_curvirostris','Rhamphococcyx_curvirostris','Phaenicophaeus_curvirostris','Rhamphococcyx_curvirostr'],'yellow'], |
4658 | + 'Megalapteryx_benhami':[['Megalapteryx_benhami'],'red'] |
4659 | + } |
4660 | + equivalents = load_equivalents(csv_file) |
4661 | + self.assertDictEqual(equivalents, expected) |
4662 | + |
4663 | + |
4664 | def test_name_tree(self): |
4665 | XML = etree.tostring(etree.parse('data/input/single_source_no_names.phyml',parser),pretty_print=True) |
4666 | xml_root = _parse_xml(XML) |
4667 | @@ -583,6 +677,35 @@ |
4668 | XML = etree.tostring(etree.parse('data/input/single_source.phyml',parser),pretty_print=True) |
4669 | self.assert_(isEqualXML(new_xml,XML)) |
4670 | |
4671 | + def test_all_rename_tree(self): |
4672 | + XML = etree.tostring(etree.parse('data/input/single_source_same_tree_name.phyml',parser),pretty_print=True) |
4673 | + new_xml = set_all_tree_names(XML,overwrite=True) |
4674 | + XML = etree.tostring(etree.parse('data/output/single_source_same_tree_name.phyml',parser),pretty_print=True) |
4675 | + self.assert_(isEqualXML(new_xml,XML)) |
4676 | + |
4677 | + def test_get_all_tree_names(self): |
4678 | + XML = etree.tostring(etree.parse('data/input/single_source_same_tree_name.phyml',parser),pretty_print=True) |
4679 | + names = get_all_tree_names(XML) |
4680 | + self.assertListEqual(names,['Hill_2011_2','Hill_2011_2']) |
4681 | + |
4682 | + |
4683 | +def internet_on(host="8.8.8.8", port=443, timeout=5): |
4684 | + import socket |
4685 | + |
4686 | + """ |
4687 | + Host: 8.8.8.8 (google-public-dns-a.google.com) |
4688 | + OpenPort: 53/tcp |
4689 | + Service: domain (DNS/TCP) |
4690 | + """ |
4691 | + try: |
4692 | + socket.setdefaulttimeout(timeout) |
4693 | + socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect((host, port)) |
4694 | + return True |
4695 | + except Exception as ex: |
4696 | + print ex.message |
4697 | + return False |
4698 | + |
4699 | + |
4700 | |
4701 | if __name__ == '__main__': |
4702 | unittest.main() |
4703 | |
4704 | === modified file 'stk/test/_trees.py' |
4705 | --- stk/test/_trees.py 2015-03-26 09:58:58 +0000 |
4706 | +++ stk/test/_trees.py 2017-01-12 09:27:31 +0000 |
4707 | @@ -5,7 +5,7 @@ |
4708 | sys.path.insert(0,"../../") |
4709 | from stk.supertree_toolkit import import_tree, obtain_trees, get_all_taxa, _assemble_tree_matrix, create_matrix, _delete_taxon, _sub_taxon,_tree_contains |
4710 | from stk.supertree_toolkit import _swap_tree_in_XML, substitute_taxa, get_taxa_from_tree, get_characters_from_tree, amalgamate_trees, _uniquify |
4711 | -from stk.supertree_toolkit import import_trees, import_tree, _trees_equal, _find_trees_for_permuting, permute_tree, get_all_source_names, _getTaxaFromNewick |
4712 | +from stk.supertree_toolkit import import_trees, import_tree, _trees_equal, _find_trees_for_permuting, permute_tree, get_all_source_names, _getTaxaFromNewick, _parse_tree |
4713 | from stk.supertree_toolkit import get_mrca |
4714 | import os |
4715 | from lxml import etree |
4716 | @@ -215,6 +215,18 @@ |
4717 | mrca = get_mrca(tree,["A","I", "L"]) |
4718 | self.assert_(mrca == 8) |
4719 | |
4720 | + def test_get_mrca(self): |
4721 | + tree = "(B,(C,(D,(E,((A,F),((I,(G,H)),(J,(K,L))))))));" |
4722 | + mrca = get_mrca(tree,["A","F"]) |
4723 | + print mrca |
4724 | + #self.assert_(mrca == 8) |
4725 | + to = _parse_tree('(X,Y,Z,(Q,W));') |
4726 | + treeobj = _parse_tree(tree) |
4727 | + newnode = treeobj.addNodeBetweenNodes(10,9) |
4728 | + treeobj.addSubTree(newnode, to, ignoreRootAssert=True) |
4729 | + treeobj.draw() |
4730 | + |
4731 | + |
4732 | def test_get_all_trees(self): |
4733 | XML = etree.tostring(etree.parse(single_source_input,parser),pretty_print=True) |
4734 | tree = obtain_trees(XML) |
4735 | |
4736 | === added file 'stk/test/data/input/auto_sub.phyml' |
4737 | --- stk/test/data/input/auto_sub.phyml 1970-01-01 00:00:00 +0000 |
4738 | +++ stk/test/data/input/auto_sub.phyml 2017-01-12 09:27:31 +0000 |
4739 | @@ -0,0 +1,97 @@ |
4740 | +<?xml version='1.0' encoding='utf-8'?> |
4741 | +<phylo_storage> |
4742 | + <project_name> |
4743 | + <string_value lines="1">Test</string_value> |
4744 | + </project_name> |
4745 | + <sources> |
4746 | + <source name="Hill_2011"> |
4747 | + <bibliographic_information> |
4748 | + <article> |
4749 | + <authors> |
4750 | + <author> |
4751 | + <surname> |
4752 | + <string_value lines="1">Hill</string_value> |
4753 | + </surname> |
4754 | + <other_names> |
4755 | + <string_value lines="1">Jon</string_value> |
4756 | + </other_names> |
4757 | + </author> |
4758 | + </authors> |
4759 | + <title> |
4760 | + <string_value lines="1">A great paper</string_value> |
4761 | + </title> |
4762 | + <year> |
4763 | + <integer_value rank="0">2011</integer_value> |
4764 | + </year> |
4765 | + <journal> |
4766 | + <string_value lines="1">Nature</string_value> |
4767 | + </journal> |
4768 | + <pages> |
4769 | + <string_value lines="1">1-12</string_value> |
4770 | + </pages> |
4771 | + </article> |
4772 | + </bibliographic_information> |
4773 | + <source_tree name="Hill_2011_1"> |
4774 | + <tree> |
4775 | + <tree_string> |
4776 | + <string_value lines="1">(Thalassarche_melanophris, Pelecaniformes, (Gallus, Gallus_varius));</string_value> |
4777 | + </tree_string> |
4778 | + <figure_legend> |
4779 | + <string_value lines="1">NA</string_value> |
4780 | + </figure_legend> |
4781 | + <figure_number> |
4782 | + <string_value lines="1">1</string_value> |
4783 | + </figure_number> |
4784 | + <page_number> |
4785 | + <string_value lines="1">1</string_value> |
4786 | + </page_number> |
4787 | + <tree_inference> |
4788 | + <optimality_criterion name="Maximum Parsimony"/> |
4789 | + </tree_inference> |
4790 | + <topology> |
4791 | + <outgroup> |
4792 | + <string_value lines="1">A</string_value> |
4793 | + </outgroup> |
4794 | + </topology> |
4795 | + </tree> |
4796 | + <taxa_data> |
4797 | + <all_extant/> |
4798 | + </taxa_data> |
4799 | + <character_data> |
4800 | + <character type="molecular" name="12S"/> |
4801 | + </character_data> |
4802 | + </source_tree> |
4803 | + <source_tree name="Hill_2011_2"> |
4804 | + <tree> |
4805 | + <tree_string> |
4806 | + <string_value lines="1">(Gallus_lafayetii, (Platalea_leucorodia, (Ardea_humbloti, Ardea_goliath)));</string_value> |
4807 | + </tree_string> |
4808 | + <figure_legend> |
4809 | + <string_value lines="1">NA</string_value> |
4810 | + </figure_legend> |
4811 | + <figure_number> |
4812 | + <string_value lines="1">1</string_value> |
4813 | + </figure_number> |
4814 | + <page_number> |
4815 | + <string_value lines="1">1</string_value> |
4816 | + </page_number> |
4817 | + <tree_inference> |
4818 | + <optimality_criterion name="Maximum Parsimony"/> |
4819 | + </tree_inference> |
4820 | + <topology> |
4821 | + <outgroup> |
4822 | + <string_value lines="1">A</string_value> |
4823 | + </outgroup> |
4824 | + </topology> |
4825 | + </tree> |
4826 | + <taxa_data> |
4827 | + <all_extant/> |
4828 | + </taxa_data> |
4829 | + <character_data> |
4830 | + <character type="molecular" name="12S"/> |
4831 | + </character_data> |
4832 | + </source_tree> |
4833 | + </source> |
4834 | + </sources> |
4835 | + <history/> |
4836 | +</phylo_storage> |
4837 | |
4838 | === modified file 'stk/test/data/input/check_data_ind.phyml' |
4839 | --- stk/test/data/input/check_data_ind.phyml 2014-10-09 09:33:21 +0000 |
4840 | +++ stk/test/data/input/check_data_ind.phyml 2017-01-12 09:27:31 +0000 |
4841 | @@ -249,6 +249,147 @@ |
4842 | <character type="molecular" name="12S"/> |
4843 | </character_data> |
4844 | </source_tree> |
4845 | + <source_tree name="Hill_Davis_2011_3"> |
4846 | + <tree> |
4847 | + <tree_string> |
4848 | + <string_value lines="1">((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,H:1.00000)0.00000:0.00000)0.00000:0.00000;</string_value> |
4849 | + </tree_string> |
4850 | + <figure_legend> |
4851 | + <string_value lines="1">NA</string_value> |
4852 | + </figure_legend> |
4853 | + <figure_number> |
4854 | + <string_value lines="1">0</string_value> |
4855 | + </figure_number> |
4856 | + <page_number> |
4857 | + <string_value lines="1">0</string_value> |
4858 | + </page_number> |
4859 | + <tree_inference> |
4860 | + <optimality_criterion name="Maximum Parsimony"/> |
4861 | + </tree_inference> |
4862 | + <topology> |
4863 | + <outgroup> |
4864 | + <string_value lines="1">A</string_value> |
4865 | + </outgroup> |
4866 | + </topology> |
4867 | + </tree> |
4868 | + <taxa_data> |
4869 | + <mixed_fossil_and_extant> |
4870 | + <taxon name="A"> |
4871 | + <fossil/> |
4872 | + </taxon> |
4873 | + <taxon name="B"> |
4874 | + <fossil/> |
4875 | + </taxon> |
4876 | + </mixed_fossil_and_extant> |
4877 | + </taxa_data> |
4878 | + <character_data> |
4879 | + <character type="molecular" name="12S"/> |
4880 | + </character_data> |
4881 | + </source_tree> |
4882 | + </source> |
4883 | + <source name="Hill_Davis_2013"> |
4884 | + <bibliographic_information> |
4885 | + <article> |
4886 | + <authors> |
4887 | + <author> |
4888 | + <surname> |
4889 | + <string_value lines="1">Hill</string_value> |
4890 | + </surname> |
4891 | + <other_names> |
4892 | + <string_value lines="1">Jon</string_value> |
4893 | + </other_names> |
4894 | + </author> |
4895 | + <author> |
4896 | + <surname> |
4897 | + <string_value lines="1">Davis</string_value> |
4898 | + </surname> |
4899 | + <other_names> |
4900 | + <string_value lines="1">Katie</string_value> |
4901 | + </other_names> |
4902 | + </author> |
4903 | + </authors> |
4904 | + <title> |
4905 | + <string_value lines="1">Another superb paper</string_value> |
4906 | + </title> |
4907 | + <year> |
4908 | + <integer_value rank="0">2013</integer_value> |
4909 | + </year> |
4910 | + </article> |
4911 | + </bibliographic_information> |
4912 | + <source_tree name="Hill_Davis_2013_1"> |
4913 | + <tree> |
4914 | + <tree_string> |
4915 | + <string_value lines="1">((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,Z:1.00000)0.00000:0.00000)0.00000:0.00000;</string_value> |
4916 | + </tree_string> |
4917 | + <figure_legend> |
4918 | + <string_value lines="1">NA</string_value> |
4919 | + </figure_legend> |
4920 | + <figure_number> |
4921 | + <string_value lines="1">0</string_value> |
4922 | + </figure_number> |
4923 | + <page_number> |
4924 | + <string_value lines="1">0</string_value> |
4925 | + </page_number> |
4926 | + <tree_inference> |
4927 | + <optimality_criterion name="Maximum Parsimony"/> |
4928 | + </tree_inference> |
4929 | + <topology> |
4930 | + <outgroup> |
4931 | + <string_value lines="1">A</string_value> |
4932 | + </outgroup> |
4933 | + </topology> |
4934 | + </tree> |
4935 | + <taxa_data> |
4936 | + <mixed_fossil_and_extant> |
4937 | + <taxon name="A"> |
4938 | + <fossil/> |
4939 | + </taxon> |
4940 | + <taxon name="B"> |
4941 | + <fossil/> |
4942 | + </taxon> |
4943 | + </mixed_fossil_and_extant> |
4944 | + </taxa_data> |
4945 | + <character_data> |
4946 | + <character type="molecular" name="12S"/> |
4947 | + </character_data> |
4948 | + </source_tree> |
4949 | + <source_tree name="Hill_Davis_2013_2"> |
4950 | + <tree> |
4951 | + <tree_string> |
4952 | + <string_value lines="1">((A:1.00000,B:1.00000)0.00000:0.00000,F:1.00000,E:1.00000,(G:1.00000,Z:1.00000)0.00000:0.00000)0.00000:0.00000;</string_value> |
4953 | + </tree_string> |
4954 | + <figure_legend> |
4955 | + <string_value lines="1">NA</string_value> |
4956 | + </figure_legend> |
4957 | + <figure_number> |
4958 | + <string_value lines="1">0</string_value> |
4959 | + </figure_number> |
4960 | + <page_number> |
4961 | + <string_value lines="1">0</string_value> |
4962 | + </page_number> |
4963 | + <tree_inference> |
4964 | + <optimality_criterion name="Maximum Parsimony"/> |
4965 | + </tree_inference> |
4966 | + <topology> |
4967 | + <outgroup> |
4968 | + <string_value lines="1">A</string_value> |
4969 | + </outgroup> |
4970 | + </topology> |
4971 | + </tree> |
4972 | + <taxa_data> |
4973 | + <mixed_fossil_and_extant> |
4974 | + <taxon name="A"> |
4975 | + <fossil/> |
4976 | + </taxon> |
4977 | + <taxon name="B"> |
4978 | + <fossil/> |
4979 | + </taxon> |
4980 | + </mixed_fossil_and_extant> |
4981 | + </taxa_data> |
4982 | + <character_data> |
4983 | + <character type="molecular" name="12S"/> |
4984 | + </character_data> |
4985 | + </source_tree> |
4986 | </source> |
4987 | </sources> |
4988 | <history/> |
4989 | |
4990 | === added file 'stk/test/data/input/check_taxonomy.phyml' |
4991 | --- stk/test/data/input/check_taxonomy.phyml 1970-01-01 00:00:00 +0000 |
4992 | +++ stk/test/data/input/check_taxonomy.phyml 2017-01-12 09:27:31 +0000 |
4993 | @@ -0,0 +1,67 @@ |
4994 | +<?xml version='1.0' encoding='utf-8'?> |
4995 | +<phylo_storage> |
4996 | + <project_name> |
4997 | + <string_value lines="1">Test</string_value> |
4998 | + </project_name> |
4999 | + <sources> |
5000 | + <source name="Hill_2011"> |
The diff has been truncated for viewing.