Status: | Needs review |
---|---|
Proposed branch: | lp:~wking-drexel/pybtex/wtk |
Merge into: | lp:pybtex |
Diff against target: |
769 lines (+584/-10) 8 files modified
pybtex/core.py (+87/-1) pybtex/database/__init__.py (+137/-0) pybtex/database/input/bibtex.py (+90/-4) pybtex/database/input/bibtexml.py (+1/-0) pybtex/database/input/bibyaml.py (+1/-0) pybtex/database/output/__init__.py (+2/-2) pybtex/database/output/bibtex.py (+4/-3) pybtex/database/output/clean_bibtex.py (+262/-0) |
To merge this branch: | bzr merge lp:~wking-drexel/pybtex/wtk |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Andrey Golovizin | Pending | ||
Review via email: mp+38535@code.launchpad.net |
Commit message
Description of the change
Document a bug with whitespace normalization in concatenated BibTeX strings. Add workaround for AUthor #' and '# OTher case.
- 680. By Trevor King
-
Don't check end brace level of empty strings in database.
output. bibtex. - 681. By Trevor King
-
FormattedEntry should subclass object.
Old-style classes are evil!
- 682. By Trevor King
-
Add optional automatic key generator to increase database consistency.
- 683. By Trevor King
-
Store re-keyed entries in the automatic key generator ;).
- 684. By Trevor King
-
Strip non-letter characters from author lastnames in auto-generated keys.
- 685. By Trevor King
-
Give database authors a mechanism for avoiding auto-generated keys.
- 686. By Trevor King
-
Also allow '-' in auto-generated keys.
- 687. By Trevor King
-
Add Entry.ordered_
field_items( ) for consistent database output. - 688. By Trevor King
-
Raise PybtexError if a duplicate BibTeX field would clobber a previous value.
- 689. By Trevor King
-
Improve key auto-generation for von-containing names.
- 690. By Trevor King
-
Add doctest showing that comments are parsed in BibTeX input.
- 691. By Trevor King
-
Simplify comment parsing doctest and add Parser.raw_macros (preserving case).
- 692. By Trevor King
-
Handle simple comments in the bibtex parser.
- 693. By Trevor King
-
.parse_stream() should return .data to match .parse_file().
- 694. By Trevor King
-
Change BibTeX comment regexp to ignore indented comments.
This is at the expense of possibly handling escaped percent signs
(\%), which are not handled correctly at the moment (untested).
Luckily, they probably don't show up too often in BibTeX files. - 695. By Trevor King
-
Demonstrate that the BibTeX parser doesn't treat %s in strings as comments.
- 696. By Trevor King
-
Make (raw_macro,value) dict generation a method: .get_raw_macros().
- 697. By Trevor King
-
Add database.
output. clean_bibtex for generating easy-to-maintain output BibTeX. Example usage to clean up an inconsistent BibTeX file:
import sys
from pybtex.database. input.bibtex import Parser
from pybtex.database. output. clean_bibtex import Writer
p = Parser()
d = p.parse_file("/ path/to/ old/file. bib")
w = Writer()
w.write_stream( d, sys.stdout, p.get_raw_macros())
Information about new macros and key changes will be logged to stderr. - 698. By Trevor King
-
Pass extra Writer.write_file() arguments on to Writer.
write_stream( ). This allows us to pass in a list of macros for the clean_bibtex.
Writer. - 699. By Trevor King
-
Fix unicode/string TypeError in clean_bibtex.
Writer. write_file( ). This avoids:
Traceback (most recent call last):
...
File "pybtex/database/ output/ clean_bibtex. py", line 187, in write_stream
stream.write(' \n')
File "/usr/lib/python2. 6/io.py" , line 1491, in write
s.__class_ _.__name_ _)
TypeError: can't write str to text stream - 700. By Trevor King
-
Be more flexible in finding a person for KeyGenerator.
generate_ key(). - 701. By Trevor King
-
Add crossref and booktitle to Entry.field_order.
- 702. By Trevor King
-
Add clean_bibtex doctest and fix some errors it turned up.
- 703. By Trevor King
-
Make "and" avoidance in whitespace normalization immune to endlines.
This works around problems like:
author = AUthor #" and
"# OTher,
Unmerged revisions
- 703. By Trevor King
-
Make "and" avoidance in whitespace normalization immune to endlines.
This works around problems like:
author = AUthor #" and
"# OTher, - 702. By Trevor King
-
Add clean_bibtex doctest and fix some errors it turned up.
- 701. By Trevor King
-
Add crossref and booktitle to Entry.field_order.
- 700. By Trevor King
-
Be more flexible in finding a person for KeyGenerator.
generate_ key(). - 699. By Trevor King
-
Fix unicode/string TypeError in clean_bibtex.
Writer. write_file( ). This avoids:
Traceback (most recent call last):
...
File "pybtex/database/ output/ clean_bibtex. py", line 187, in write_stream
stream.write(' \n')
File "/usr/lib/python2. 6/io.py" , line 1491, in write
s.__class_ _.__name_ _)
TypeError: can't write str to text stream - 698. By Trevor King
-
Pass extra Writer.write_file() arguments on to Writer.
write_stream( ). This allows us to pass in a list of macros for the clean_bibtex.
Writer. - 697. By Trevor King
-
Add database.
output. clean_bibtex for generating easy-to-maintain output BibTeX. Example usage to clean up an inconsistent BibTeX file:
import sys
from pybtex.database. input.bibtex import Parser
from pybtex.database. output. clean_bibtex import Writer
p = Parser()
d = p.parse_file("/ path/to/ old/file. bib")
w = Writer()
w.write_stream( d, sys.stdout, p.get_raw_macros())
Information about new macros and key changes will be logged to stderr. - 696. By Trevor King
-
Make (raw_macro,value) dict generation a method: .get_raw_macros().
- 695. By Trevor King
-
Demonstrate that the BibTeX parser doesn't treat %s in strings as comments.
- 694. By Trevor King
-
Change BibTeX comment regexp to ignore indented comments.
This is at the expense of possibly handling escaped percent signs
(\%), which are not handled correctly at the moment (untested).
Luckily, they probably don't show up too often in BibTeX files.
Updating diff...
An updated diff will be available in a few minutes. Reload to see the changes.
Preview Diff
1 | === modified file 'pybtex/core.py' |
2 | --- pybtex/core.py 2009-09-06 18:24:24 +0000 |
3 | +++ pybtex/core.py 2010-10-17 00:56:45 +0000 |
4 | @@ -17,11 +17,13 @@ |
5 | """ |
6 | |
7 | import re |
8 | +import sys |
9 | +import time |
10 | |
11 | from pybtex.exceptions import PybtexError |
12 | from pybtex.bibtex.utils import split_tex_string |
13 | |
14 | -class FormattedEntry: |
15 | +class FormattedEntry(object): |
16 | """Formatted bibliography entry. Consists of |
17 | - key (which is used for sorting); |
18 | - label (which appears in the resulting bibliography) |
19 | @@ -51,6 +53,16 @@ |
20 | """Bibliography entry. Important members are: |
21 | - persons (a dict of Person objects) |
22 | - fields (all dict of string) |
23 | + |
24 | + >>> e = Entry(type_='article', |
25 | + ... fields=FieldDict(parent=None, year='1234', month='February'), |
26 | + ... persons={'author': [Person('A. Uthor'), Person('O. Ther')]}) |
27 | + >>> e.year() |
28 | + 1234 |
29 | + >>> e.month() |
30 | + 2 |
31 | + >>> e.day() |
32 | + 0 |
33 | """ |
34 | |
35 | def __init__(self, type_, fields=None, persons=None, collection=None): |
36 | @@ -65,6 +77,43 @@ |
37 | |
38 | # for BibTeX interpreter |
39 | self.vars = {} |
40 | + # for output writers |
41 | + self.field_order = [ |
42 | + 'key', |
43 | + 'title', |
44 | + 'crossref', |
45 | + 'booktitle', |
46 | + 'affiliation', |
47 | + 'collaboration', |
48 | + 'year', |
49 | + 'season', |
50 | + 'month', |
51 | + 'day', |
52 | + 'version', |
53 | + 'journal', |
54 | + 'edition', |
55 | + 'volume', |
56 | + 'number', |
57 | + 'pages', |
58 | + 'xxpages', |
59 | + 'numpages', |
60 | + 'isbn', |
61 | + 'isbn-13', |
62 | + 'issn', |
63 | + 'lccn', |
64 | + 'publisher', |
65 | + 'address', |
66 | + 'bibdate', |
67 | + 'copyright', |
68 | + 'eid', |
69 | + 'doi', |
70 | + 'eprint', |
71 | + 'url', |
72 | + 'keywords', |
73 | + 'abstract', |
74 | + 'note', |
75 | + 'project', |
76 | + ] |
77 | |
78 | def get_crossref(self): |
79 | return self.collection.entries[self.fields['crossref']] |
80 | @@ -80,6 +129,43 @@ |
81 | def add_person(self, person, role): |
82 | self.persons.setdefault(role, []).append(person) |
83 | |
84 | + def ordered_field_items(self): |
85 | + def field_index(field_value): |
86 | + field,value = field_value |
87 | + try: |
88 | + return self.field_order.index(field) |
89 | + except ValueError: |
90 | + sys.stderr.write('Unordered field: %s\n' % field) |
91 | + return len(self.field_order) |
92 | + return sorted(self.fields.items(), |
93 | + key=lambda field_value: field_index(field_value)) |
94 | + |
95 | + def year(self): |
96 | + return int(self.fields.get('year', '0')) |
97 | + |
98 | + def month(self): |
99 | + month = self.fields.get('month', '0').lower() |
100 | + if month == '0': |
101 | + month = 0 |
102 | + else: |
103 | + for format in ['%B', '%b', '%m']: |
104 | + try: |
105 | + t = time.strptime(month, format) |
106 | + month = t.tm_mon |
107 | + break |
108 | + except ValueError: |
109 | + continue |
110 | + if not isinstance(month, int): |
111 | + month = 0 # give up parsing month |
112 | + assert month >= 0 and month <= 12, month |
113 | + return month |
114 | + |
115 | + def day(self): |
116 | + day = int(self.fields.get('day', '0')) |
117 | + assert day >= 0 and day <= 31, day |
118 | + return day |
119 | + |
120 | + |
121 | class Person(object): |
122 | """Represents a person (usually human). |
123 | |
124 | |
125 | === modified file 'pybtex/database/__init__.py' |
126 | --- pybtex/database/__init__.py 2010-01-20 19:34:11 +0000 |
127 | +++ pybtex/database/__init__.py 2010-10-17 00:56:45 +0000 |
128 | @@ -13,6 +13,8 @@ |
129 | # You should have received a copy of the GNU General Public License |
130 | # along with this program. If not, see <http://www.gnu.org/licenses/>. |
131 | |
132 | +import re |
133 | + |
134 | from collections import defaultdict |
135 | |
136 | from pybtex.exceptions import PybtexError |
137 | @@ -66,3 +68,138 @@ |
138 | if crossref_count >= min_crossrefs |
139 | ) |
140 | return sorted(citations) |
141 | + |
142 | + |
143 | +class KeyGenerator(object): |
144 | + """ |
145 | + >>> from pybtex.core import FieldDict, Entry, Person |
146 | + >>> bib_data = BibliographyData(entries={ |
147 | + ... 'old_key_A': Entry(type_='article', |
148 | + ... fields=FieldDict(parent=None, year='1234', month='February'), |
149 | + ... persons={'author': [Person('A. Uthor'), Person('O. Ther')]}), |
150 | + ... 'old_key_B': Entry(type_='article', |
151 | + ... fields=FieldDict(parent=None, year='1234', month='January'), |
152 | + ... persons={'author': [Person('A. Uthor'), Person('O. Ther')]}), |
153 | + ... }) |
154 | + >>> kg = KeyGenerator() |
155 | + >>> kg.rekey(bib_data) |
156 | + >>> print sorted(kg.changed_keys.items()) |
157 | + [('old_key_A', 'uthor34b'), ('old_key_B', 'uthor34a')] |
158 | + >>> print sorted(bib_data.entries.items()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE |
159 | + [('uthor34a', <pybtex.core.Entry object at 0x...>), |
160 | + ('uthor34b', <pybtex.core.Entry object at 0x...>)] |
161 | + """ |
162 | + def __init__(self): |
163 | + self.changed_keys = {} # changed_keys[old] = new |
164 | + self.original_keys = {} # original_keys[id(entry)] = original_key |
165 | + |
166 | + def rekey(self, bib_data): |
167 | + new_keys = {} |
168 | + for key, entry in bib_data.entries.iteritems(): |
169 | + self.original_keys[id(entry)] = key |
170 | + new_key = self.generate_key(key, entry) |
171 | + new_keys.setdefault(new_key, []).append(entry) |
172 | + |
173 | + collision = True |
174 | + while collision: |
175 | + collision = False |
176 | + for key,entries in new_keys.items(): |
177 | + if len(entries) > 1: |
178 | + entries = new_keys.pop(key) |
179 | + _new_keys = self.handle_key_collision(key, entries) |
180 | + for entry,new_key in _new_keys.iteritems(): |
181 | + new_keys.setdefault(new_key, []).append(entry) |
182 | + if len(new_keys[new_key]) > 1: |
183 | + collision = True |
184 | + # we've messed with new_keys, restart iteration |
185 | + break |
186 | + |
187 | + for key,entries in new_keys.items(): |
188 | + entry = entries[0] |
189 | + new_keys[key] = entry |
190 | + if key != self.original_keys[id(entry)]: |
191 | + old_key = self.original_keys[id(entry)] |
192 | + self.changed_keys[old_key] = key |
193 | + |
194 | + bib_data.entries = new_keys |
195 | + |
196 | + def generate_key(self, old_key, entry): |
197 | + """ |
198 | + Generate keys that look like |
199 | + `<first-author-lowercase-lastname><two-digit-publication-year>`. |
200 | + |
201 | + >>> from pybtex.core import FieldDict, Entry, Person |
202 | + >>> kg = KeyGenerator() |
203 | + >>> e = Entry(type_='article', |
204 | + ... fields=FieldDict(parent=None, year='1234'), |
205 | + ... persons={'author': [Person('A. Uthor')]}) |
206 | + >>> print kg.generate_key(old_key=None, entry=e) |
207 | + uthor34 |
208 | + |
209 | + Non-letter characters are stripped from the lastname: |
210 | + |
211 | + >>> for name in ['van Kampen, N.G.', 'H\"anggi, Peter']: |
212 | + ... e.persons['author'][0] = Person(name) |
213 | + ... print kg.generate_key(old_key=None, entry=e) |
214 | + vanKampen34 |
215 | + hanggi34 |
216 | + |
217 | + If you want to avoid the auto-generated key, you can set the |
218 | + key explicitly via the `key` field. |
219 | + |
220 | + >>> e.fields['key'] = 'custom_key' |
221 | + >>> print kg.generate_key(old_key=None, entry=e) |
222 | + custom_key |
223 | + """ |
224 | + if 'key' in entry.fields: |
225 | + return entry.fields['key'] |
226 | + for role in ['author', 'editor']: |
227 | + if role in entry.persons: |
228 | + break |
229 | + if role not in entry.persons: |
230 | + raise PybtexError('No person fields in %s' % old_key) |
231 | + first_author = entry.persons[role][0] |
232 | + last = first_author.get_part_as_text('last') |
233 | + von = first_author.get_part_as_text('prelast') |
234 | + if len(von) > 0: |
235 | + last = ''.join([von, last]) |
236 | + else: |
237 | + last = last.lower() |
238 | + last = re.sub('[^-a-zA-Z]', '', last) |
239 | + return '%s%s' % (last, entry.fields['year'][-2:]) |
240 | + |
241 | + def handle_key_collision(self, old_key, entries): |
242 | + entries.sort(key=self._collision_sort_key, ) |
243 | + return dict([(entry, old_key+self._hexavigesimal(i)) |
244 | + for i,entry in enumerate(entries)]) |
245 | + |
246 | + def _collision_sort_key(self, entry): |
247 | + tiebreaker = float('0.%d' % id(entry)) |
248 | + return entry.year()*1e4 + entry.month()*1e2 + entry.day() + tiebreaker |
249 | + |
250 | + def _hexavigesimal(self, n): |
251 | + """ |
252 | + Based on |
253 | + http://www.python-forum.org/pythonforum/viewtopic.php?f=3&t=16144&start=0 |
254 | + |
255 | + >>> kg = KeyGenerator() |
256 | + >>> for n in [0,1,2,25,26,27,51,52,53,678]: |
257 | + ... print "%4d == %s" % (n, kg._hexavigesimal(n)) |
258 | + ... |
259 | + 0 == a |
260 | + 1 == b |
261 | + 2 == c |
262 | + 25 == z |
263 | + 26 == ba |
264 | + 27 == bb |
265 | + 51 == bz |
266 | + 52 == ca |
267 | + 53 == cb |
268 | + 678 == bac |
269 | + """ |
270 | + h = [] |
271 | + while True: |
272 | + n,r = divmod(n,26) |
273 | + h[0:0] = chr(ord('a')+r) |
274 | + if n == 0: |
275 | + return ''.join(h) |
276 | |
277 | === modified file 'pybtex/database/input/bibtex.py' |
278 | --- pybtex/database/input/bibtex.py 2010-09-02 21:18:15 +0000 |
279 | +++ pybtex/database/input/bibtex.py 2010-10-17 00:56:45 +0000 |
280 | @@ -13,10 +13,76 @@ |
281 | # You should have received a copy of the GNU General Public License |
282 | # along with this program. If not, see <http://www.gnu.org/licenses/>. |
283 | |
284 | -"""BibTeX parser""" |
285 | +"""BibTeX parser |
286 | + |
287 | +>>> import StringIO |
288 | +>>> parser = Parser() |
289 | +>>> bib_data = parser.parse_stream(StringIO.StringIO(''' |
290 | +... %@String{COMMENT = "This macro is commented out."} |
291 | +... %@String{INDENTED_COMMENT = "This macro is also commented out."} |
292 | +... @String{SCI = "Science"} |
293 | +... |
294 | +... @String{JFernandez = "Fernandez, Julio M."} |
295 | +... @String{HGaub = "Gaub, Hermann E."} |
296 | +... @String{MGautel = "Gautel, Mathias"} |
297 | +... @String{FOesterhelt = "Oesterhelt, Filipp"} |
298 | +... @String{MRief = "Rief, Matthias"} |
299 | +... |
300 | +... @Article{rief97, |
301 | +... author = MRief #" and "# MGautel #" and "# FOesterhelt |
302 | +... #" and "# JFernandez #" and "# HGaub, |
303 | +... title = "Reversible Unfolding of Individual Titin |
304 | +... Immunoglobulin Domains by {AFM}", |
305 | +... journal = SCI, |
306 | +... volume = 276, |
307 | +... number = 5315, |
308 | +... pages = "1109--1112", |
309 | +... year = 1997, |
310 | +... doi = "10.1126/science.276.5315.1109", |
311 | +... URL = "http://www.sciencemag.org/cgi/content/abstract/276/5315/1109", |
312 | +... eprint = "http://www.sciencemag.org/cgi/reprint/276/5315/1109.pdf", |
313 | +... note = "Percent signs in strings (%) are not treated as comments", |
314 | +... } |
315 | +... ''')) |
316 | + |
317 | +Check that the commented out macro `COMMENT` was not parsed. |
318 | + |
319 | +>>> print sorted(parser.macros.keys()) # doctest: +NORMALIZE_WHITESPACE |
320 | +['apr', 'aug', 'dec', 'feb', 'foesterhelt', 'hgaub', 'jan', 'jfernandez', |
321 | + 'jul', 'jun', 'mar', 'may', 'mgautel', 'mrief', 'nov', 'oct', 'sci', 'sep'] |
322 | + |
323 | +Check that percent signs in strings are not treated as comments. |
324 | + |
325 | +>>> print bib_data.entries['rief97'].fields['note'] |
326 | +Percent signs in strings (%) are not treated as comments |
327 | + |
328 | +To access the macros by their original (not lowercased) name, use |
329 | + |
330 | +>>> print '\\n'.join([str(item) for item |
331 | +... in sorted(parser.get_raw_macros().iteritems())]) |
332 | +... # doctest: +REPORT_UDIFF |
333 | +('FOesterhelt', 'Oesterhelt, Filipp') |
334 | +('HGaub', 'Gaub, Hermann E.') |
335 | +('JFernandez', 'Fernandez, Julio M.') |
336 | +('MGautel', 'Gautel, Mathias') |
337 | +('MRief', 'Rief, Matthias') |
338 | +('SCI', 'Science') |
339 | +('apr', 'April') |
340 | +('aug', 'August') |
341 | +('dec', 'December') |
342 | +('feb', 'February') |
343 | +('jan', 'January') |
344 | +('jul', 'July') |
345 | +('jun', 'June') |
346 | +('mar', 'March') |
347 | +('may', 'May') |
348 | +('nov', 'November') |
349 | +('oct', 'October') |
350 | +('sep', 'September') |
351 | +""" |
352 | |
353 | from pyparsing import ( |
354 | - Word, Literal, CaselessLiteral, CharsNotIn, |
355 | + Word, Literal, CaselessLiteral, CharsNotIn, Regex, |
356 | nums, alphas, alphanums, printables, delimitedList, downcaseTokens, |
357 | Suppress, Combine, Group, Dict, |
358 | Forward, ZeroOrMore, Optional, |
359 | @@ -46,9 +112,13 @@ |
360 | |
361 | file_extension = 'bib' |
362 | |
363 | +def _normalize_whitespace(tok): |
364 | + if tok.strip() == 'and': |
365 | + return ' and ' |
366 | + return textutils.normalize_whitespace(tok) |
367 | |
368 | def normalize_whitespace(s, loc, toks): |
369 | - return [textutils.normalize_whitespace(tok) for tok in toks] |
370 | + return [_normalize_whitespace(tok) for tok in toks] |
371 | |
372 | |
373 | class Parser(ParserBase): |
374 | @@ -60,6 +130,9 @@ |
375 | self.default_macros = dict(macros) |
376 | self.person_fields = person_fields |
377 | |
378 | + comment = Regex(r'%.*') |
379 | + comment.setName("LaTeX style comment") |
380 | + |
381 | lbrace = Suppress('{') |
382 | rbrace = Suppress('}') |
383 | def bibtexGroup(s): |
384 | @@ -78,14 +151,17 @@ |
385 | name_chars = alphanums + '!$&*+-./:;<>?[\\]^_`|~\x7f' |
386 | macro_substitution = Word(name_chars).setParseAction(self.substitute_macro) |
387 | name = Word(name_chars).setParseAction(downcaseTokens) |
388 | + raw_name = Word(name_chars) |
389 | value = Combine(delimitedList(bibTeXString | Word(nums) | macro_substitution, delim='#'), adjacent=False) |
390 | |
391 | #fields |
392 | field = Group(name + Suppress('=') + value) |
393 | + raw_field = Group(raw_name + Suppress('=') + value) |
394 | fields = Dict(delimitedList(field)) |
395 | + raw_fields = Dict(delimitedList(raw_field)) |
396 | |
397 | #String (aka macro) |
398 | - string_body = bibtexGroup(fields) |
399 | + string_body = bibtexGroup(raw_fields) |
400 | string = at + CaselessLiteral('STRING').suppress() + string_body |
401 | string.setParseAction(self.process_macro) |
402 | |
403 | @@ -105,6 +181,7 @@ |
404 | entry.setParseAction(lambda toks: self.process_entry(toks)) |
405 | |
406 | self.BibTeX_entry = string | preamble | entry |
407 | + self.BibTeX_entry.ignore(comment) |
408 | |
409 | def process_preamble(self, toks): |
410 | self.data.add_to_preamble(toks[0]) |
411 | @@ -123,6 +200,8 @@ |
412 | for name in split_name_list(v): |
413 | entry.add_person(Person(name), k) |
414 | else: |
415 | + if k in entry.fields: |
416 | + raise PybtexError('Duplicate fields (%s) in %s.' % (k, key)) |
417 | entry.fields[k] = v |
418 | # return (key, entry) |
419 | self.data.add_entry(key, entry) |
420 | @@ -136,11 +215,13 @@ |
421 | |
422 | def process_macro(self, s, loc, toks): |
423 | self.macros[toks[0][0].lower()] = toks[0][1] |
424 | + self.raw_macros[toks[0][0].lower()] = toks[0][0] |
425 | |
426 | def parse_stream(self, stream): |
427 | self.unnamed_entry_counter = 1 |
428 | |
429 | self.macros = dict(self.default_macros) |
430 | + self.raw_macros = dict([(m,m) for m in self.default_macros.keys()]) |
431 | try: |
432 | # entries = dict(entry[0][0] for entry in self.BibTeX_entry.scanString(s)) |
433 | self.BibTeX_entry.searchString(stream.read()) |
434 | @@ -149,3 +230,8 @@ |
435 | print e |
436 | import sys |
437 | sys.exit(1) |
438 | + return self.data |
439 | + |
440 | + def get_raw_macros(self): |
441 | + return dict([(self.raw_macros[macro], value) |
442 | + for macro,value in self.macros.iteritems()]) |
443 | |
444 | === modified file 'pybtex/database/input/bibtexml.py' |
445 | --- pybtex/database/input/bibtexml.py 2009-08-19 18:35:32 +0000 |
446 | +++ pybtex/database/input/bibtexml.py 2010-10-17 00:56:45 +0000 |
447 | @@ -35,6 +35,7 @@ |
448 | t = ET.parse(stream) |
449 | entries = t.findall(bibtexns + 'entry') |
450 | self.data.add_entries(self.process_entry(entry) for entry in entries) |
451 | + return self.data |
452 | |
453 | def process_entry(self, entry): |
454 | def process_person(person_entry, role): |
455 | |
456 | === modified file 'pybtex/database/input/bibyaml.py' |
457 | --- pybtex/database/input/bibyaml.py 2009-08-19 18:35:32 +0000 |
458 | +++ pybtex/database/input/bibyaml.py 2010-10-17 00:56:45 +0000 |
459 | @@ -32,6 +32,7 @@ |
460 | pass |
461 | |
462 | self.data.add_entries(entries) |
463 | + return self.data |
464 | |
465 | def process_entry(self, entry): |
466 | e = Entry(entry['type']) |
467 | |
468 | === modified file 'pybtex/database/output/__init__.py' |
469 | --- pybtex/database/output/__init__.py 2010-09-05 14:50:17 +0000 |
470 | +++ pybtex/database/output/__init__.py 2010-10-17 00:56:45 +0000 |
471 | @@ -27,11 +27,11 @@ |
472 | def __init__(self, encoding=None): |
473 | self.encoding = encoding |
474 | |
475 | - def write_file(self, bib_data, filename): |
476 | + def write_file(self, bib_data, filename, *args, **kwargs): |
477 | open_file = pybtex.io.open_unicode if self.unicode_io else pybtex.io.open_raw |
478 | mode = 'w' if self.unicode_io else 'wb' |
479 | with open_file(filename, mode, encoding=self.encoding) as stream: |
480 | - self.write_stream(bib_data, stream) |
481 | + self.write_stream(bib_data, stream, *args, **kwargs) |
482 | |
483 | def write_stream(self, bib_data, stream): |
484 | raise NotImplementedError |
485 | |
486 | === modified file 'pybtex/database/output/bibtex.py' |
487 | --- pybtex/database/output/bibtex.py 2010-09-20 07:17:44 +0000 |
488 | +++ pybtex/database/output/bibtex.py 2010-10-17 00:56:45 +0000 |
489 | @@ -49,9 +49,10 @@ |
490 | return '{%s}' % s |
491 | |
492 | def check_braces(self, s): |
493 | - end_brace_level = list(scan_bibtex_string(s))[-1][1] |
494 | - if end_brace_level != 0: |
495 | - raise BibTeXError('String has unmatched braces: %s' % s) |
496 | + if len(s) > 0: |
497 | + end_brace_level = list(scan_bibtex_string(s))[-1][1] |
498 | + if end_brace_level != 0: |
499 | + raise BibTeXError('String has unmatched braces: %s' % s) |
500 | |
501 | def write_stream(self, bib_data, stream): |
502 | def write_field(type, value): |
503 | |
504 | === added file 'pybtex/database/output/clean_bibtex.py' |
505 | --- pybtex/database/output/clean_bibtex.py 1970-01-01 00:00:00 +0000 |
506 | +++ pybtex/database/output/clean_bibtex.py 2010-10-17 00:56:45 +0000 |
507 | @@ -0,0 +1,262 @@ |
508 | +# Copyright (C) 2006, 2007, 2008, 2009 Andrey Golovizin |
509 | +# 2010 W.Trevor King |
510 | +# |
511 | +# This program is free software: you can redistribute it and/or modify |
512 | +# it under the terms of the GNU General Public License as published by |
513 | +# the Free Software Foundation, either version 3 of the License, or |
514 | +# (at your option) any later version. |
515 | +# |
516 | +# This program is distributed in the hope that it will be useful, |
517 | +# but WITHOUT ANY WARRANTY; without even the implied warranty of |
518 | +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
519 | +# GNU General Public License for more details. |
520 | +# |
521 | +# You should have received a copy of the GNU General Public License |
522 | +# along with this program. If not, see <http://www.gnu.org/licenses/>. |
523 | + |
524 | +import re |
525 | +import sys |
526 | +import textwrap |
527 | + |
528 | +import pybtex.io |
529 | +from pybtex.bibtex.exceptions import BibTeXError |
530 | +from pybtex.database import KeyGenerator |
531 | +from pybtex.database.input.bibtex import Parser |
532 | +from pybtex.database.output.bibtex import Writer as BibtexWriter |
533 | + |
534 | + |
535 | +class Writer(BibtexWriter): |
536 | + """Outputs BibTeX markup |
537 | + |
538 | + Differences from the `.bibtex` module: |
539 | + |
540 | + * Citation keys are generated on the fly via KeyGenerator.rekey(). |
541 | + check |
542 | + * Entries are alphabetized by citation key. |
543 | + check |
544 | + * Citation keys are listed on the same line as the entry declaration. |
545 | + check |
546 | + * Fields within an entry are ordered via Entry.ordered_field_items(). |
547 | + check |
548 | + * Values that do not need to be surrounded by quotes are not. |
549 | + check |
550 | + * Month macros (`jan`, `feb`, ...) are used when possible. |
551 | + check |
552 | + * Names of people, journals, and publishers are broken out into |
553 | + `@String` definitions for consistency. |
554 | + * Long values are wrapped at a configurable character width. |
555 | + check |
556 | + |
557 | + >>> import StringIO |
558 | + >>> import sys |
559 | + >>> parser = Parser() |
560 | + >>> bib_data = parser.parse_stream(StringIO.StringIO(''' |
561 | + ... @String{SCI = "Science"} |
562 | + ... |
563 | + ... @Article{ crazy-key, |
564 | + ... author = "Matthias Rief and Mathias Gautel and Filipp |
565 | + ... Oesterhelt and Julio M. Fernandez and Hermann E. Gaub", |
566 | + ... title = "Reversible Unfolding of Individual Titin |
567 | + ... Immunoglobulin Domains by {AFM}", |
568 | + ... journal = SCI, |
569 | + ... volume = 276, |
570 | + ... number = 5315, |
571 | + ... pages = "1109--1112", |
572 | + ... year = 1997, |
573 | + ... doi = "10.1126/science.276.5315.1109", |
574 | + ... url = "http://www.sciencemag.org/cgi/content/abstract/276/5315/1109", |
575 | + ... eprint = "http://www.sciencemag.org/cgi/reprint/276/5315/1109.pdf", |
576 | + ... } |
577 | + ... ''')) |
578 | + >>> writer = Writer() |
579 | + >>> writer.width = 70 |
580 | + >>> stderr = sys.stderr |
581 | + >>> sys.stderr = StringIO.StringIO() |
582 | + >>> writer.write_stream(bib_data, sys.stdout, parser.get_raw_macros()) |
583 | + ... # doctest: +REPORT_UDIFF |
584 | + @string{JMFernandez = "Fernandez, Julio M."} |
585 | + @string{HEGaub = "Gaub, Hermann E."} |
586 | + @string{MGautel = "Gautel, Mathias"} |
587 | + @string{FOesterhelt = "Oesterhelt, Filipp"} |
588 | + @string{MRief = "Rief, Matthias"} |
589 | + @string{SCI = "Science"} |
590 | + <BLANKLINE> |
591 | + @article { rief97, |
592 | + author = MRief #" and "# MGautel #" and "# FOesterhelt #" and "# |
593 | + JMFernandez #" and "# HEGaub, |
594 | + title = "Reversible Unfolding of Individual Titin Immunoglobulin |
595 | + Domains by {AFM}", |
596 | + year = 1997, |
597 | + journal = SCI, |
598 | + volume = 276, |
599 | + number = 5315, |
600 | + pages = "1109--1112", |
601 | + doi = "10.1126/science.276.5315.1109", |
602 | + eprint = "http://www.sciencemag.org/cgi/reprint/276/5315/1109.pdf", |
603 | + url = |
604 | + "http://www.sciencemag.org/cgi/content/abstract/276/5315/1109" |
605 | + } |
606 | + <BLANKLINE> |
607 | + >>> print sys.stderr.getvalue() # doctest: +REPORT_UDIFF |
608 | + New macros: |
609 | + FOesterhelt = Oesterhelt, Filipp |
610 | + HEGaub = Gaub, Hermann E. |
611 | + JMFernandez = Fernandez, Julio M. |
612 | + MGautel = Gautel, Mathias |
613 | + MRief = Rief, Matthias |
614 | + Changed keys: |
615 | + "crazy-key" -> "rief97" |
616 | + <BLANKLINE> |
617 | + >>> sys.stderr = stderr |
618 | + """ |
619 | + |
620 | + width = 79 |
621 | + |
622 | + def quote(self, s): |
623 | + """ |
624 | + >>> w = Writer() |
625 | + >>> print w.quote('The World') |
626 | + "The World" |
627 | + >>> print w.quote(r'The \emph{World}') |
628 | + "The \emph{World}" |
629 | + >>> print w.quote(r'The "World"') |
630 | + {The "World"} |
631 | + >>> try: |
632 | + ... print w.quote(r'The {World') |
633 | + ... except BibTeXError, error: |
634 | + ... print error |
635 | + String has unmatched braces: The {World |
636 | + >>> print w.quote('1234') |
637 | + 1234 |
638 | + """ |
639 | + try: |
640 | + int(s) |
641 | + return s # no need to quote numbers |
642 | + except ValueError: |
643 | + return BibtexWriter.quote(self, s) |
644 | + |
645 | + def format_name(self, person): |
646 | + def join(l): |
647 | + return ' '.join([name for name in l if name]) |
648 | + first = person.get_part_as_text('first') |
649 | + middle = person.get_part_as_text('middle') |
650 | + prelast = person.get_part_as_text('prelast') |
651 | + last = person.get_part_as_text('last') |
652 | + lineage = person.get_part_as_text('lineage') |
653 | + s = '' |
654 | + if last: |
655 | + s += join([prelast, last]) |
656 | + if lineage: |
657 | + s += ', %s' % lineage |
658 | + if first or middle: |
659 | + s += ', ' |
660 | + s += join([first, middle]) |
661 | + return s |
662 | + |
663 | + def generate_person_macro(self, person): |
664 | + """ |
665 | + Generate macros that look like |
666 | + `<first-caps><middle-caps><prelast><last>' |
667 | + |
668 | + >>> from pybtex.core import Person |
669 | + >>> w = Writer() |
670 | + >>> for name in ['Gaub, Hermann E.', 'van Kampen, N.G.', |
671 | + ... 'H\"anggi, Peter']: |
672 | + ... p = Person(name) |
673 | + ... print w.generate_person_macro(p) |
674 | + HEGaub |
675 | + NGvanKampen |
676 | + PHanggi |
677 | + """ |
678 | + first = person.get_part_as_text('first') |
679 | + middle = person.get_part_as_text('middle') |
680 | + von = person.get_part_as_text('prelast') |
681 | + last = person.get_part_as_text('last') |
682 | + first = re.sub('[^A-Z]', '', first) |
683 | + middle = re.sub('[^A-Z]', '', middle) |
684 | + macro = ''.join([first, middle, von, last]) |
685 | + return re.sub('[^-a-zA-Z]', '', macro) |
686 | + |
687 | + def generate_string_macro(self, string): |
688 | + return re.sub('[^-a-zA-Z]', '', string) |
689 | + |
690 | + def generate_inverse_macros(self, bib_data, macros): |
691 | + new_macros = {} |
692 | + inverse_macros = dict( |
693 | + [(value,macro) for macro,value in macros.iteritems()]) |
694 | + def add_macro(macro, value): |
695 | + if macro in macros: |
696 | + while macro in macros and macros[macro] != value: |
697 | + macro += 'X' |
698 | + new_macros[macro] = value |
699 | + macros[macro] = value |
700 | + inverse_macros[value] = macro |
701 | + for key,entry in bib_data.entries.iteritems(): |
702 | + for role,persons in entry.persons.iteritems(): |
703 | + for person in persons: |
704 | + name = self.format_name(person) |
705 | + if name not in inverse_macros: |
706 | + macro = self.generate_person_macro(person) |
707 | + add_macro(macro, name) |
708 | + for field in ['journal', 'publisher']: |
709 | + if field in entry.fields: |
710 | + value = entry.fields[field] |
711 | + if value not in inverse_macros: |
712 | + macro = self.generate_string_macro(value) |
713 | + add_macro(macro, value) |
714 | + return inverse_macros, new_macros |
715 | + |
716 | + def write_stream(self, bib_data, stream, macros={}): |
717 | + inverse_macros,new_macros = self.generate_inverse_macros( |
718 | + bib_data, macros) |
719 | + sys.stderr.write( |
720 | + 'New macros:\n %s\n' |
721 | + % '\n '.join(['%s = %s' % (macro,value) for macro,value |
722 | + in sorted(new_macros.items())])) |
723 | + def write_field(type, value, raw=False): |
724 | + if raw == True: |
725 | + pass |
726 | + elif value in inverse_macros: |
727 | + value = inverse_macros[value] |
728 | + else: |
729 | + value = self.quote(value) |
730 | + stream.write(u',\n%s' % textwrap.fill( |
731 | + '%s = %s' % (type, value), |
732 | + width=self.width, |
733 | + initial_indent=' '*4, |
734 | + subsequent_indent=' '*8)) |
735 | + def write_persons(persons, role): |
736 | + if persons: |
737 | + write_field(role, u' #" and "# '.join( |
738 | + [inverse_macros[self.format_name(person)] |
739 | + for person in persons]), raw=True) |
740 | + def write_preamble(preamble): |
741 | + if preamble: |
742 | + stream.write(u'@preamble{%s}\n\n' % self.quote(preamble)) |
743 | + |
744 | + kg = KeyGenerator() |
745 | + kg.rekey(bib_data) |
746 | + sys.stderr.write( |
747 | + 'Changed keys:\n %s\n' |
748 | + % '\n '.join(['"%s" -> "%s"' % (old,new) for old,new |
749 | + in sorted(kg.changed_keys.items())])) |
750 | + |
751 | + need_blank_line = False |
752 | + parser = Parser() |
753 | + for value, macro in sorted(inverse_macros.iteritems()): |
754 | + if macro not in parser.default_macros: |
755 | + need_blank_line = True |
756 | + stream.write(u'@string{%s = %s}\n' |
757 | + % (macro, self.quote(value))) |
758 | + if need_blank_line == True: |
759 | + stream.write(u'\n') |
760 | + |
761 | + write_preamble(bib_data.preamble()) |
762 | + |
763 | + for key, entry in sorted(bib_data.entries.items()): |
764 | + stream.write(u'@%s { %s' % (entry.type, key)) |
765 | + for role, persons in entry.persons.iteritems(): |
766 | + write_persons(persons, role) |
767 | + for type, value in entry.ordered_field_items(): |
768 | + write_field(type, value) |
769 | + stream.write(u'\n}\n\n') |