Merge lp:~wking-drexel/pybtex/wtk into lp:pybtex

Proposed by Trevor King
Status: Needs review
Proposed branch: lp:~wking-drexel/pybtex/wtk
Merge into: lp:pybtex
Diff against target: 769 lines (+584/-10)
8 files modified
pybtex/core.py (+87/-1)
pybtex/database/__init__.py (+137/-0)
pybtex/database/input/bibtex.py (+90/-4)
pybtex/database/input/bibtexml.py (+1/-0)
pybtex/database/input/bibyaml.py (+1/-0)
pybtex/database/output/__init__.py (+2/-2)
pybtex/database/output/bibtex.py (+4/-3)
pybtex/database/output/clean_bibtex.py (+262/-0)
To merge this branch: bzr merge lp:~wking-drexel/pybtex/wtk
Reviewer Review Type Date Requested Status
Andrey Golovizin Pending
Review via email: mp+38535@code.launchpad.net

Description of the change

Document a bug with whitespace normalization in concatenated BibTeX strings. Add workaround for AUthor #' and '# OTher case.

To post a comment you must log in.
lp:~wking-drexel/pybtex/wtk updated
680. By Trevor King

Don't check end brace level of empty strings in database.output.bibtex.

681. By Trevor King

FormattedEntry should subclass object.

Old-style classes are evil!

682. By Trevor King

Add optional automatic key generator to increase database consistency.

683. By Trevor King

Store re-keyed entries in the automatic key generator ;).

684. By Trevor King

Strip non-letter characters from author lastnames in auto-generated keys.

685. By Trevor King

Give database authors a mechanism for avoiding auto-generated keys.

686. By Trevor King

Also allow '-' in auto-generated keys.

687. By Trevor King

Add Entry.ordered_field_items() for consistent database output.

688. By Trevor King

Raise PybtexError if a duplicate BibTeX field would clobber a previous value.

689. By Trevor King

Improve key auto-generation for von-containing names.

690. By Trevor King

Add doctest showing that comments are parsed in BibTeX input.

691. By Trevor King

Simplify comment parsing doctest and add Parser.raw_macros (preserving case).

692. By Trevor King

Handle simple comments in the bibtex parser.

693. By Trevor King

.parse_stream() should return .data to match .parse_file().

694. By Trevor King

Change BibTeX comment regexp to ignore indented comments.

This is at the expense of possibly handling escaped percent signs
(\%), which are not handled correctly at the moment (untested).
Luckily, they probably don't show up too often in BibTeX files.

695. By Trevor King

Demonstrate that the BibTeX parser doesn't treat %s in strings as comments.

696. By Trevor King

Make (raw_macro,value) dict generation a method: .get_raw_macros().

697. By Trevor King

Add database.output.clean_bibtex for generating easy-to-maintain output BibTeX.

Example usage to clean up an inconsistent BibTeX file:
    import sys
    from pybtex.database.input.bibtex import Parser
    from pybtex.database.output.clean_bibtex import Writer
    p = Parser()
    d = p.parse_file("/path/to/old/file.bib")
    w = Writer()
    w.write_stream(d, sys.stdout, p.get_raw_macros())
Information about new macros and key changes will be logged to stderr.

698. By Trevor King

Pass extra Writer.write_file() arguments on to Writer.write_stream().

This allows us to pass in a list of macros for the clean_bibtex.Writer.

699. By Trevor King

Fix unicode/string TypeError in clean_bibtex.Writer.write_file().

This avoids:

Traceback (most recent call last):
  ...
  File "pybtex/database/output/clean_bibtex.py", line 187, in write_stream
    stream.write('\n')
  File "/usr/lib/python2.6/io.py", line 1491, in write
    s.__class__.__name__)
TypeError: can't write str to text stream

700. By Trevor King

Be more flexible in finding a person for KeyGenerator.generate_key().

701. By Trevor King

Add crossref and booktitle to Entry.field_order.

702. By Trevor King

Add clean_bibtex doctest and fix some errors it turned up.

703. By Trevor King

Make "and" avoidance in whitespace normalization immune to endlines.

This works around problems like:
  author = AUthor #" and
           "# OTher,

Unmerged revisions

703. By Trevor King

Make "and" avoidance in whitespace normalization immune to endlines.

This works around problems like:
  author = AUthor #" and
           "# OTher,

702. By Trevor King

Add clean_bibtex doctest and fix some errors it turned up.

701. By Trevor King

Add crossref and booktitle to Entry.field_order.

700. By Trevor King

Be more flexible in finding a person for KeyGenerator.generate_key().

699. By Trevor King

Fix unicode/string TypeError in clean_bibtex.Writer.write_file().

This avoids:

Traceback (most recent call last):
  ...
  File "pybtex/database/output/clean_bibtex.py", line 187, in write_stream
    stream.write('\n')
  File "/usr/lib/python2.6/io.py", line 1491, in write
    s.__class__.__name__)
TypeError: can't write str to text stream

698. By Trevor King

Pass extra Writer.write_file() arguments on to Writer.write_stream().

This allows us to pass in a list of macros for the clean_bibtex.Writer.

697. By Trevor King

Add database.output.clean_bibtex for generating easy-to-maintain output BibTeX.

Example usage to clean up an inconsistent BibTeX file:
    import sys
    from pybtex.database.input.bibtex import Parser
    from pybtex.database.output.clean_bibtex import Writer
    p = Parser()
    d = p.parse_file("/path/to/old/file.bib")
    w = Writer()
    w.write_stream(d, sys.stdout, p.get_raw_macros())
Information about new macros and key changes will be logged to stderr.

696. By Trevor King

Make (raw_macro,value) dict generation a method: .get_raw_macros().

695. By Trevor King

Demonstrate that the BibTeX parser doesn't treat %s in strings as comments.

694. By Trevor King

Change BibTeX comment regexp to ignore indented comments.

This is at the expense of possibly handling escaped percent signs
(\%), which are not handled correctly at the moment (untested).
Luckily, they probably don't show up too often in BibTeX files.

Updating diff...

An updated diff will be available in a few minutes. Reload to see the changes.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'pybtex/core.py'
2--- pybtex/core.py 2009-09-06 18:24:24 +0000
3+++ pybtex/core.py 2010-10-17 00:56:45 +0000
4@@ -17,11 +17,13 @@
5 """
6
7 import re
8+import sys
9+import time
10
11 from pybtex.exceptions import PybtexError
12 from pybtex.bibtex.utils import split_tex_string
13
14-class FormattedEntry:
15+class FormattedEntry(object):
16 """Formatted bibliography entry. Consists of
17 - key (which is used for sorting);
18 - label (which appears in the resulting bibliography)
19@@ -51,6 +53,16 @@
20 """Bibliography entry. Important members are:
21 - persons (a dict of Person objects)
22 - fields (all dict of string)
23+
24+ >>> e = Entry(type_='article',
25+ ... fields=FieldDict(parent=None, year='1234', month='February'),
26+ ... persons={'author': [Person('A. Uthor'), Person('O. Ther')]})
27+ >>> e.year()
28+ 1234
29+ >>> e.month()
30+ 2
31+ >>> e.day()
32+ 0
33 """
34
35 def __init__(self, type_, fields=None, persons=None, collection=None):
36@@ -65,6 +77,43 @@
37
38 # for BibTeX interpreter
39 self.vars = {}
40+ # for output writers
41+ self.field_order = [
42+ 'key',
43+ 'title',
44+ 'crossref',
45+ 'booktitle',
46+ 'affiliation',
47+ 'collaboration',
48+ 'year',
49+ 'season',
50+ 'month',
51+ 'day',
52+ 'version',
53+ 'journal',
54+ 'edition',
55+ 'volume',
56+ 'number',
57+ 'pages',
58+ 'xxpages',
59+ 'numpages',
60+ 'isbn',
61+ 'isbn-13',
62+ 'issn',
63+ 'lccn',
64+ 'publisher',
65+ 'address',
66+ 'bibdate',
67+ 'copyright',
68+ 'eid',
69+ 'doi',
70+ 'eprint',
71+ 'url',
72+ 'keywords',
73+ 'abstract',
74+ 'note',
75+ 'project',
76+ ]
77
78 def get_crossref(self):
79 return self.collection.entries[self.fields['crossref']]
80@@ -80,6 +129,43 @@
81 def add_person(self, person, role):
82 self.persons.setdefault(role, []).append(person)
83
84+ def ordered_field_items(self):
85+ def field_index(field_value):
86+ field,value = field_value
87+ try:
88+ return self.field_order.index(field)
89+ except ValueError:
90+ sys.stderr.write('Unordered field: %s\n' % field)
91+ return len(self.field_order)
92+ return sorted(self.fields.items(),
93+ key=lambda field_value: field_index(field_value))
94+
95+ def year(self):
96+ return int(self.fields.get('year', '0'))
97+
98+ def month(self):
99+ month = self.fields.get('month', '0').lower()
100+ if month == '0':
101+ month = 0
102+ else:
103+ for format in ['%B', '%b', '%m']:
104+ try:
105+ t = time.strptime(month, format)
106+ month = t.tm_mon
107+ break
108+ except ValueError:
109+ continue
110+ if not isinstance(month, int):
111+ month = 0 # give up parsing month
112+ assert month >= 0 and month <= 12, month
113+ return month
114+
115+ def day(self):
116+ day = int(self.fields.get('day', '0'))
117+ assert day >= 0 and day <= 31, day
118+ return day
119+
120+
121 class Person(object):
122 """Represents a person (usually human).
123
124
125=== modified file 'pybtex/database/__init__.py'
126--- pybtex/database/__init__.py 2010-01-20 19:34:11 +0000
127+++ pybtex/database/__init__.py 2010-10-17 00:56:45 +0000
128@@ -13,6 +13,8 @@
129 # You should have received a copy of the GNU General Public License
130 # along with this program. If not, see <http://www.gnu.org/licenses/>.
131
132+import re
133+
134 from collections import defaultdict
135
136 from pybtex.exceptions import PybtexError
137@@ -66,3 +68,138 @@
138 if crossref_count >= min_crossrefs
139 )
140 return sorted(citations)
141+
142+
143+class KeyGenerator(object):
144+ """
145+ >>> from pybtex.core import FieldDict, Entry, Person
146+ >>> bib_data = BibliographyData(entries={
147+ ... 'old_key_A': Entry(type_='article',
148+ ... fields=FieldDict(parent=None, year='1234', month='February'),
149+ ... persons={'author': [Person('A. Uthor'), Person('O. Ther')]}),
150+ ... 'old_key_B': Entry(type_='article',
151+ ... fields=FieldDict(parent=None, year='1234', month='January'),
152+ ... persons={'author': [Person('A. Uthor'), Person('O. Ther')]}),
153+ ... })
154+ >>> kg = KeyGenerator()
155+ >>> kg.rekey(bib_data)
156+ >>> print sorted(kg.changed_keys.items())
157+ [('old_key_A', 'uthor34b'), ('old_key_B', 'uthor34a')]
158+ >>> print sorted(bib_data.entries.items()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
159+ [('uthor34a', <pybtex.core.Entry object at 0x...>),
160+ ('uthor34b', <pybtex.core.Entry object at 0x...>)]
161+ """
162+ def __init__(self):
163+ self.changed_keys = {} # changed_keys[old] = new
164+ self.original_keys = {} # original_keys[id(entry)] = original_key
165+
166+ def rekey(self, bib_data):
167+ new_keys = {}
168+ for key, entry in bib_data.entries.iteritems():
169+ self.original_keys[id(entry)] = key
170+ new_key = self.generate_key(key, entry)
171+ new_keys.setdefault(new_key, []).append(entry)
172+
173+ collision = True
174+ while collision:
175+ collision = False
176+ for key,entries in new_keys.items():
177+ if len(entries) > 1:
178+ entries = new_keys.pop(key)
179+ _new_keys = self.handle_key_collision(key, entries)
180+ for entry,new_key in _new_keys.iteritems():
181+ new_keys.setdefault(new_key, []).append(entry)
182+ if len(new_keys[new_key]) > 1:
183+ collision = True
184+ # we've messed with new_keys, restart iteration
185+ break
186+
187+ for key,entries in new_keys.items():
188+ entry = entries[0]
189+ new_keys[key] = entry
190+ if key != self.original_keys[id(entry)]:
191+ old_key = self.original_keys[id(entry)]
192+ self.changed_keys[old_key] = key
193+
194+ bib_data.entries = new_keys
195+
196+ def generate_key(self, old_key, entry):
197+ """
198+ Generate keys that look like
199+ `<first-author-lowercase-lastname><two-digit-publication-year>`.
200+
201+ >>> from pybtex.core import FieldDict, Entry, Person
202+ >>> kg = KeyGenerator()
203+ >>> e = Entry(type_='article',
204+ ... fields=FieldDict(parent=None, year='1234'),
205+ ... persons={'author': [Person('A. Uthor')]})
206+ >>> print kg.generate_key(old_key=None, entry=e)
207+ uthor34
208+
209+ Non-letter characters are stripped from the lastname:
210+
211+ >>> for name in ['van Kampen, N.G.', 'H\"anggi, Peter']:
212+ ... e.persons['author'][0] = Person(name)
213+ ... print kg.generate_key(old_key=None, entry=e)
214+ vanKampen34
215+ hanggi34
216+
217+ If you want to avoid the auto-generated key, you can set the
218+ key explicitly via the `key` field.
219+
220+ >>> e.fields['key'] = 'custom_key'
221+ >>> print kg.generate_key(old_key=None, entry=e)
222+ custom_key
223+ """
224+ if 'key' in entry.fields:
225+ return entry.fields['key']
226+ for role in ['author', 'editor']:
227+ if role in entry.persons:
228+ break
229+ if role not in entry.persons:
230+ raise PybtexError('No person fields in %s' % old_key)
231+ first_author = entry.persons[role][0]
232+ last = first_author.get_part_as_text('last')
233+ von = first_author.get_part_as_text('prelast')
234+ if len(von) > 0:
235+ last = ''.join([von, last])
236+ else:
237+ last = last.lower()
238+ last = re.sub('[^-a-zA-Z]', '', last)
239+ return '%s%s' % (last, entry.fields['year'][-2:])
240+
241+ def handle_key_collision(self, old_key, entries):
242+ entries.sort(key=self._collision_sort_key, )
243+ return dict([(entry, old_key+self._hexavigesimal(i))
244+ for i,entry in enumerate(entries)])
245+
246+ def _collision_sort_key(self, entry):
247+ tiebreaker = float('0.%d' % id(entry))
248+ return entry.year()*1e4 + entry.month()*1e2 + entry.day() + tiebreaker
249+
250+ def _hexavigesimal(self, n):
251+ """
252+ Based on
253+ http://www.python-forum.org/pythonforum/viewtopic.php?f=3&t=16144&start=0
254+
255+ >>> kg = KeyGenerator()
256+ >>> for n in [0,1,2,25,26,27,51,52,53,678]:
257+ ... print "%4d == %s" % (n, kg._hexavigesimal(n))
258+ ...
259+ 0 == a
260+ 1 == b
261+ 2 == c
262+ 25 == z
263+ 26 == ba
264+ 27 == bb
265+ 51 == bz
266+ 52 == ca
267+ 53 == cb
268+ 678 == bac
269+ """
270+ h = []
271+ while True:
272+ n,r = divmod(n,26)
273+ h[0:0] = chr(ord('a')+r)
274+ if n == 0:
275+ return ''.join(h)
276
277=== modified file 'pybtex/database/input/bibtex.py'
278--- pybtex/database/input/bibtex.py 2010-09-02 21:18:15 +0000
279+++ pybtex/database/input/bibtex.py 2010-10-17 00:56:45 +0000
280@@ -13,10 +13,76 @@
281 # You should have received a copy of the GNU General Public License
282 # along with this program. If not, see <http://www.gnu.org/licenses/>.
283
284-"""BibTeX parser"""
285+"""BibTeX parser
286+
287+>>> import StringIO
288+>>> parser = Parser()
289+>>> bib_data = parser.parse_stream(StringIO.StringIO('''
290+... %@String{COMMENT = "This macro is commented out."}
291+... %@String{INDENTED_COMMENT = "This macro is also commented out."}
292+... @String{SCI = "Science"}
293+...
294+... @String{JFernandez = "Fernandez, Julio M."}
295+... @String{HGaub = "Gaub, Hermann E."}
296+... @String{MGautel = "Gautel, Mathias"}
297+... @String{FOesterhelt = "Oesterhelt, Filipp"}
298+... @String{MRief = "Rief, Matthias"}
299+...
300+... @Article{rief97,
301+... author = MRief #" and "# MGautel #" and "# FOesterhelt
302+... #" and "# JFernandez #" and "# HGaub,
303+... title = "Reversible Unfolding of Individual Titin
304+... Immunoglobulin Domains by {AFM}",
305+... journal = SCI,
306+... volume = 276,
307+... number = 5315,
308+... pages = "1109--1112",
309+... year = 1997,
310+... doi = "10.1126/science.276.5315.1109",
311+... URL = "http://www.sciencemag.org/cgi/content/abstract/276/5315/1109",
312+... eprint = "http://www.sciencemag.org/cgi/reprint/276/5315/1109.pdf",
313+... note = "Percent signs in strings (%) are not treated as comments",
314+... }
315+... '''))
316+
317+Check that the commented out macro `COMMENT` was not parsed.
318+
319+>>> print sorted(parser.macros.keys()) # doctest: +NORMALIZE_WHITESPACE
320+['apr', 'aug', 'dec', 'feb', 'foesterhelt', 'hgaub', 'jan', 'jfernandez',
321+ 'jul', 'jun', 'mar', 'may', 'mgautel', 'mrief', 'nov', 'oct', 'sci', 'sep']
322+
323+Check that percent signs in strings are not treated as comments.
324+
325+>>> print bib_data.entries['rief97'].fields['note']
326+Percent signs in strings (%) are not treated as comments
327+
328+To access the macros by their original (not lowercased) name, use
329+
330+>>> print '\\n'.join([str(item) for item
331+... in sorted(parser.get_raw_macros().iteritems())])
332+... # doctest: +REPORT_UDIFF
333+('FOesterhelt', 'Oesterhelt, Filipp')
334+('HGaub', 'Gaub, Hermann E.')
335+('JFernandez', 'Fernandez, Julio M.')
336+('MGautel', 'Gautel, Mathias')
337+('MRief', 'Rief, Matthias')
338+('SCI', 'Science')
339+('apr', 'April')
340+('aug', 'August')
341+('dec', 'December')
342+('feb', 'February')
343+('jan', 'January')
344+('jul', 'July')
345+('jun', 'June')
346+('mar', 'March')
347+('may', 'May')
348+('nov', 'November')
349+('oct', 'October')
350+('sep', 'September')
351+"""
352
353 from pyparsing import (
354- Word, Literal, CaselessLiteral, CharsNotIn,
355+ Word, Literal, CaselessLiteral, CharsNotIn, Regex,
356 nums, alphas, alphanums, printables, delimitedList, downcaseTokens,
357 Suppress, Combine, Group, Dict,
358 Forward, ZeroOrMore, Optional,
359@@ -46,9 +112,13 @@
360
361 file_extension = 'bib'
362
363+def _normalize_whitespace(tok):
364+ if tok.strip() == 'and':
365+ return ' and '
366+ return textutils.normalize_whitespace(tok)
367
368 def normalize_whitespace(s, loc, toks):
369- return [textutils.normalize_whitespace(tok) for tok in toks]
370+ return [_normalize_whitespace(tok) for tok in toks]
371
372
373 class Parser(ParserBase):
374@@ -60,6 +130,9 @@
375 self.default_macros = dict(macros)
376 self.person_fields = person_fields
377
378+ comment = Regex(r'%.*')
379+ comment.setName("LaTeX style comment")
380+
381 lbrace = Suppress('{')
382 rbrace = Suppress('}')
383 def bibtexGroup(s):
384@@ -78,14 +151,17 @@
385 name_chars = alphanums + '!$&*+-./:;<>?[\\]^_`|~\x7f'
386 macro_substitution = Word(name_chars).setParseAction(self.substitute_macro)
387 name = Word(name_chars).setParseAction(downcaseTokens)
388+ raw_name = Word(name_chars)
389 value = Combine(delimitedList(bibTeXString | Word(nums) | macro_substitution, delim='#'), adjacent=False)
390
391 #fields
392 field = Group(name + Suppress('=') + value)
393+ raw_field = Group(raw_name + Suppress('=') + value)
394 fields = Dict(delimitedList(field))
395+ raw_fields = Dict(delimitedList(raw_field))
396
397 #String (aka macro)
398- string_body = bibtexGroup(fields)
399+ string_body = bibtexGroup(raw_fields)
400 string = at + CaselessLiteral('STRING').suppress() + string_body
401 string.setParseAction(self.process_macro)
402
403@@ -105,6 +181,7 @@
404 entry.setParseAction(lambda toks: self.process_entry(toks))
405
406 self.BibTeX_entry = string | preamble | entry
407+ self.BibTeX_entry.ignore(comment)
408
409 def process_preamble(self, toks):
410 self.data.add_to_preamble(toks[0])
411@@ -123,6 +200,8 @@
412 for name in split_name_list(v):
413 entry.add_person(Person(name), k)
414 else:
415+ if k in entry.fields:
416+ raise PybtexError('Duplicate fields (%s) in %s.' % (k, key))
417 entry.fields[k] = v
418 # return (key, entry)
419 self.data.add_entry(key, entry)
420@@ -136,11 +215,13 @@
421
422 def process_macro(self, s, loc, toks):
423 self.macros[toks[0][0].lower()] = toks[0][1]
424+ self.raw_macros[toks[0][0].lower()] = toks[0][0]
425
426 def parse_stream(self, stream):
427 self.unnamed_entry_counter = 1
428
429 self.macros = dict(self.default_macros)
430+ self.raw_macros = dict([(m,m) for m in self.default_macros.keys()])
431 try:
432 # entries = dict(entry[0][0] for entry in self.BibTeX_entry.scanString(s))
433 self.BibTeX_entry.searchString(stream.read())
434@@ -149,3 +230,8 @@
435 print e
436 import sys
437 sys.exit(1)
438+ return self.data
439+
440+ def get_raw_macros(self):
441+ return dict([(self.raw_macros[macro], value)
442+ for macro,value in self.macros.iteritems()])
443
444=== modified file 'pybtex/database/input/bibtexml.py'
445--- pybtex/database/input/bibtexml.py 2009-08-19 18:35:32 +0000
446+++ pybtex/database/input/bibtexml.py 2010-10-17 00:56:45 +0000
447@@ -35,6 +35,7 @@
448 t = ET.parse(stream)
449 entries = t.findall(bibtexns + 'entry')
450 self.data.add_entries(self.process_entry(entry) for entry in entries)
451+ return self.data
452
453 def process_entry(self, entry):
454 def process_person(person_entry, role):
455
456=== modified file 'pybtex/database/input/bibyaml.py'
457--- pybtex/database/input/bibyaml.py 2009-08-19 18:35:32 +0000
458+++ pybtex/database/input/bibyaml.py 2010-10-17 00:56:45 +0000
459@@ -32,6 +32,7 @@
460 pass
461
462 self.data.add_entries(entries)
463+ return self.data
464
465 def process_entry(self, entry):
466 e = Entry(entry['type'])
467
468=== modified file 'pybtex/database/output/__init__.py'
469--- pybtex/database/output/__init__.py 2010-09-05 14:50:17 +0000
470+++ pybtex/database/output/__init__.py 2010-10-17 00:56:45 +0000
471@@ -27,11 +27,11 @@
472 def __init__(self, encoding=None):
473 self.encoding = encoding
474
475- def write_file(self, bib_data, filename):
476+ def write_file(self, bib_data, filename, *args, **kwargs):
477 open_file = pybtex.io.open_unicode if self.unicode_io else pybtex.io.open_raw
478 mode = 'w' if self.unicode_io else 'wb'
479 with open_file(filename, mode, encoding=self.encoding) as stream:
480- self.write_stream(bib_data, stream)
481+ self.write_stream(bib_data, stream, *args, **kwargs)
482
483 def write_stream(self, bib_data, stream):
484 raise NotImplementedError
485
486=== modified file 'pybtex/database/output/bibtex.py'
487--- pybtex/database/output/bibtex.py 2010-09-20 07:17:44 +0000
488+++ pybtex/database/output/bibtex.py 2010-10-17 00:56:45 +0000
489@@ -49,9 +49,10 @@
490 return '{%s}' % s
491
492 def check_braces(self, s):
493- end_brace_level = list(scan_bibtex_string(s))[-1][1]
494- if end_brace_level != 0:
495- raise BibTeXError('String has unmatched braces: %s' % s)
496+ if len(s) > 0:
497+ end_brace_level = list(scan_bibtex_string(s))[-1][1]
498+ if end_brace_level != 0:
499+ raise BibTeXError('String has unmatched braces: %s' % s)
500
501 def write_stream(self, bib_data, stream):
502 def write_field(type, value):
503
504=== added file 'pybtex/database/output/clean_bibtex.py'
505--- pybtex/database/output/clean_bibtex.py 1970-01-01 00:00:00 +0000
506+++ pybtex/database/output/clean_bibtex.py 2010-10-17 00:56:45 +0000
507@@ -0,0 +1,262 @@
508+# Copyright (C) 2006, 2007, 2008, 2009 Andrey Golovizin
509+# 2010 W.Trevor King
510+#
511+# This program is free software: you can redistribute it and/or modify
512+# it under the terms of the GNU General Public License as published by
513+# the Free Software Foundation, either version 3 of the License, or
514+# (at your option) any later version.
515+#
516+# This program is distributed in the hope that it will be useful,
517+# but WITHOUT ANY WARRANTY; without even the implied warranty of
518+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
519+# GNU General Public License for more details.
520+#
521+# You should have received a copy of the GNU General Public License
522+# along with this program. If not, see <http://www.gnu.org/licenses/>.
523+
524+import re
525+import sys
526+import textwrap
527+
528+import pybtex.io
529+from pybtex.bibtex.exceptions import BibTeXError
530+from pybtex.database import KeyGenerator
531+from pybtex.database.input.bibtex import Parser
532+from pybtex.database.output.bibtex import Writer as BibtexWriter
533+
534+
535+class Writer(BibtexWriter):
536+ """Outputs BibTeX markup
537+
538+ Differences from the `.bibtex` module:
539+
540+ * Citation keys are generated on the fly via KeyGenerator.rekey().
541+ check
542+ * Entries are alphabetized by citation key.
543+ check
544+ * Citation keys are listed on the same line as the entry declaration.
545+ check
546+ * Fields within an entry are ordered via Entry.ordered_field_items().
547+ check
548+ * Values that do not need to be surrounded by quotes are not.
549+ check
550+ * Month macros (`jan`, `feb`, ...) are used when possible.
551+ check
552+ * Names of people, journals, and publishers are broken out into
553+ `@String` definitions for consistency.
554+ * Long values are wrapped at a configurable character width.
555+ check
556+
557+ >>> import StringIO
558+ >>> import sys
559+ >>> parser = Parser()
560+ >>> bib_data = parser.parse_stream(StringIO.StringIO('''
561+ ... @String{SCI = "Science"}
562+ ...
563+ ... @Article{ crazy-key,
564+ ... author = "Matthias Rief and Mathias Gautel and Filipp
565+ ... Oesterhelt and Julio M. Fernandez and Hermann E. Gaub",
566+ ... title = "Reversible Unfolding of Individual Titin
567+ ... Immunoglobulin Domains by {AFM}",
568+ ... journal = SCI,
569+ ... volume = 276,
570+ ... number = 5315,
571+ ... pages = "1109--1112",
572+ ... year = 1997,
573+ ... doi = "10.1126/science.276.5315.1109",
574+ ... url = "http://www.sciencemag.org/cgi/content/abstract/276/5315/1109",
575+ ... eprint = "http://www.sciencemag.org/cgi/reprint/276/5315/1109.pdf",
576+ ... }
577+ ... '''))
578+ >>> writer = Writer()
579+ >>> writer.width = 70
580+ >>> stderr = sys.stderr
581+ >>> sys.stderr = StringIO.StringIO()
582+ >>> writer.write_stream(bib_data, sys.stdout, parser.get_raw_macros())
583+ ... # doctest: +REPORT_UDIFF
584+ @string{JMFernandez = "Fernandez, Julio M."}
585+ @string{HEGaub = "Gaub, Hermann E."}
586+ @string{MGautel = "Gautel, Mathias"}
587+ @string{FOesterhelt = "Oesterhelt, Filipp"}
588+ @string{MRief = "Rief, Matthias"}
589+ @string{SCI = "Science"}
590+ <BLANKLINE>
591+ @article { rief97,
592+ author = MRief #" and "# MGautel #" and "# FOesterhelt #" and "#
593+ JMFernandez #" and "# HEGaub,
594+ title = "Reversible Unfolding of Individual Titin Immunoglobulin
595+ Domains by {AFM}",
596+ year = 1997,
597+ journal = SCI,
598+ volume = 276,
599+ number = 5315,
600+ pages = "1109--1112",
601+ doi = "10.1126/science.276.5315.1109",
602+ eprint = "http://www.sciencemag.org/cgi/reprint/276/5315/1109.pdf",
603+ url =
604+ "http://www.sciencemag.org/cgi/content/abstract/276/5315/1109"
605+ }
606+ <BLANKLINE>
607+ >>> print sys.stderr.getvalue() # doctest: +REPORT_UDIFF
608+ New macros:
609+ FOesterhelt = Oesterhelt, Filipp
610+ HEGaub = Gaub, Hermann E.
611+ JMFernandez = Fernandez, Julio M.
612+ MGautel = Gautel, Mathias
613+ MRief = Rief, Matthias
614+ Changed keys:
615+ "crazy-key" -> "rief97"
616+ <BLANKLINE>
617+ >>> sys.stderr = stderr
618+ """
619+
620+ width = 79
621+
622+ def quote(self, s):
623+ """
624+ >>> w = Writer()
625+ >>> print w.quote('The World')
626+ "The World"
627+ >>> print w.quote(r'The \emph{World}')
628+ "The \emph{World}"
629+ >>> print w.quote(r'The "World"')
630+ {The "World"}
631+ >>> try:
632+ ... print w.quote(r'The {World')
633+ ... except BibTeXError, error:
634+ ... print error
635+ String has unmatched braces: The {World
636+ >>> print w.quote('1234')
637+ 1234
638+ """
639+ try:
640+ int(s)
641+ return s # no need to quote numbers
642+ except ValueError:
643+ return BibtexWriter.quote(self, s)
644+
645+ def format_name(self, person):
646+ def join(l):
647+ return ' '.join([name for name in l if name])
648+ first = person.get_part_as_text('first')
649+ middle = person.get_part_as_text('middle')
650+ prelast = person.get_part_as_text('prelast')
651+ last = person.get_part_as_text('last')
652+ lineage = person.get_part_as_text('lineage')
653+ s = ''
654+ if last:
655+ s += join([prelast, last])
656+ if lineage:
657+ s += ', %s' % lineage
658+ if first or middle:
659+ s += ', '
660+ s += join([first, middle])
661+ return s
662+
663+ def generate_person_macro(self, person):
664+ """
665+ Generate macros that look like
666+ `<first-caps><middle-caps><prelast><last>'
667+
668+ >>> from pybtex.core import Person
669+ >>> w = Writer()
670+ >>> for name in ['Gaub, Hermann E.', 'van Kampen, N.G.',
671+ ... 'H\"anggi, Peter']:
672+ ... p = Person(name)
673+ ... print w.generate_person_macro(p)
674+ HEGaub
675+ NGvanKampen
676+ PHanggi
677+ """
678+ first = person.get_part_as_text('first')
679+ middle = person.get_part_as_text('middle')
680+ von = person.get_part_as_text('prelast')
681+ last = person.get_part_as_text('last')
682+ first = re.sub('[^A-Z]', '', first)
683+ middle = re.sub('[^A-Z]', '', middle)
684+ macro = ''.join([first, middle, von, last])
685+ return re.sub('[^-a-zA-Z]', '', macro)
686+
687+ def generate_string_macro(self, string):
688+ return re.sub('[^-a-zA-Z]', '', string)
689+
690+ def generate_inverse_macros(self, bib_data, macros):
691+ new_macros = {}
692+ inverse_macros = dict(
693+ [(value,macro) for macro,value in macros.iteritems()])
694+ def add_macro(macro, value):
695+ if macro in macros:
696+ while macro in macros and macros[macro] != value:
697+ macro += 'X'
698+ new_macros[macro] = value
699+ macros[macro] = value
700+ inverse_macros[value] = macro
701+ for key,entry in bib_data.entries.iteritems():
702+ for role,persons in entry.persons.iteritems():
703+ for person in persons:
704+ name = self.format_name(person)
705+ if name not in inverse_macros:
706+ macro = self.generate_person_macro(person)
707+ add_macro(macro, name)
708+ for field in ['journal', 'publisher']:
709+ if field in entry.fields:
710+ value = entry.fields[field]
711+ if value not in inverse_macros:
712+ macro = self.generate_string_macro(value)
713+ add_macro(macro, value)
714+ return inverse_macros, new_macros
715+
716+ def write_stream(self, bib_data, stream, macros={}):
717+ inverse_macros,new_macros = self.generate_inverse_macros(
718+ bib_data, macros)
719+ sys.stderr.write(
720+ 'New macros:\n %s\n'
721+ % '\n '.join(['%s = %s' % (macro,value) for macro,value
722+ in sorted(new_macros.items())]))
723+ def write_field(type, value, raw=False):
724+ if raw == True:
725+ pass
726+ elif value in inverse_macros:
727+ value = inverse_macros[value]
728+ else:
729+ value = self.quote(value)
730+ stream.write(u',\n%s' % textwrap.fill(
731+ '%s = %s' % (type, value),
732+ width=self.width,
733+ initial_indent=' '*4,
734+ subsequent_indent=' '*8))
735+ def write_persons(persons, role):
736+ if persons:
737+ write_field(role, u' #" and "# '.join(
738+ [inverse_macros[self.format_name(person)]
739+ for person in persons]), raw=True)
740+ def write_preamble(preamble):
741+ if preamble:
742+ stream.write(u'@preamble{%s}\n\n' % self.quote(preamble))
743+
744+ kg = KeyGenerator()
745+ kg.rekey(bib_data)
746+ sys.stderr.write(
747+ 'Changed keys:\n %s\n'
748+ % '\n '.join(['"%s" -> "%s"' % (old,new) for old,new
749+ in sorted(kg.changed_keys.items())]))
750+
751+ need_blank_line = False
752+ parser = Parser()
753+ for value, macro in sorted(inverse_macros.iteritems()):
754+ if macro not in parser.default_macros:
755+ need_blank_line = True
756+ stream.write(u'@string{%s = %s}\n'
757+ % (macro, self.quote(value)))
758+ if need_blank_line == True:
759+ stream.write(u'\n')
760+
761+ write_preamble(bib_data.preamble())
762+
763+ for key, entry in sorted(bib_data.entries.items()):
764+ stream.write(u'@%s { %s' % (entry.type, key))
765+ for role, persons in entry.persons.iteritems():
766+ write_persons(persons, role)
767+ for type, value in entry.ordered_field_items():
768+ write_field(type, value)
769+ stream.write(u'\n}\n\n')

Subscribers

People subscribed via source and target branches