Merge lp:~spiv/bzr/better-news-merge into lp:bzr

Proposed by Andrew Bennetts
Status: Work in progress
Proposed branch: lp:~spiv/bzr/better-news-merge
Merge into: lp:bzr
Diff against target: 747 lines (+587/-64)
7 files modified
NEWS (+4/-0)
bzrlib/merge3.py (+18/-4)
bzrlib/plugins/news_merge/__init__.py (+22/-2)
bzrlib/plugins/news_merge/news_merge.py (+230/-57)
bzrlib/plugins/news_merge/parser.py (+196/-1)
bzrlib/plugins/news_merge/tests/__init__.py (+1/-0)
bzrlib/plugins/news_merge/tests/test_parser.py (+116/-0)
To merge this branch: bzr merge lp:~spiv/bzr/better-news-merge
Reviewer Review Type Date Requested Status
Robert Collins (community) Needs Fixing
Review via email: mp+19247@code.launchpad.net

This proposal supersedes a proposal from 2010-02-12.

To post a comment you must log in.
Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

This patch makes the news_merge plugin more capable. Hopefully the changes to comments and docstrings in the patch explain the changes well enough, but in short it makes it capable of coping with conflicts that span section headings (like 'Bug Fixes'), whereas before it only dealt with conflicts between bullet points within a section. See the comments and code for the details (and limitations and tradeoffs).

With this change I expect it news_merge will handle the vast majority of NEWS merges.

A minor orthogonal change adds a couple of trivial mutters to let you a reader of ~/.bzr.log know when news_merge has been used.

Revision history for this message
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal

The patch seems to have conflicts and a lot of noise :(

-Rob

Revision history for this message
Andrew Bennetts (spiv) wrote : Posted in a previous version of this proposal

Oh, I wrote the code against 2.1, but proposed against lp:bzr. Hmm.

Revision history for this message
Andrew Bennetts (spiv) wrote :

Resubmitted in hope of getting a better diff now that lp:bzr/2.1 (which this branch was based on) has been merged to lp:bzr.

Original proposal:

"""
This patch makes the news_merge plugin more capable. Hopefully the changes to comments and docstrings in the patch explain the changes well enough, but in short it makes it capable of coping with conflicts that span section headings (like 'Bug Fixes'), whereas before it only dealt with conflicts between bullet points within a section. See the comments and code for the details (and limitations and tradeoffs).

With this change I expect it news_merge will handle the vast majority of NEWS merges.

A minor orthogonal change adds a couple of trivial mutters to let you a reader of ~/.bzr.log know when news_merge has been used.
"""

Revision history for this message
Robert Collins (lifeless) wrote :

On Sat, 2010-02-13 at 01:30 +0000, Andrew Bennetts wrote:
>
> magic_marker = '|NEWS-MERGE-MAGIC-MARKER|'
>
> +# The order sections are supposed to appear in. See the template at
> the
> +# bottom of NEWS. None is a placeholder for an unseen section
> heading.
> +canonical_section_order = [
> + None, 'Compatibility Breaks', 'New Features', 'Bug Fixes',
> 'Improvements',
> + 'Documentation', 'API Changes', 'Internals', 'Testing']

This is duplicated with the template; perhaps you could use the template
instead? That would make this usable by other projects.
...
> + # Are all the conflicting lines bullets or sections?
> If so, we
> + # can merge this.
> + try:
> + base_sections =
> munged_lines_to_section_dict(base)
> + a_sections = munged_lines_to_section_dict(a)
> + b_sections = munged_lines_to_section_dict(b)
> + except MergeTooHard:
> + # Something else :(
> + # Maybe the default merge can cope.
> + trace.mutter('news_merge giving up')
> + return 'not_applicable', None

In the NEWS entry you aren't entirely clear about the implications of
'using bzr's builtin merge'. .. from the code it looks like 'if there
are conflicts outside the structured section none of the news file is
smart merged'. Perhaps we could merge just the non-section data with
bzr's built in merge, or make the NEWS entry clearer.
...
> # Transform the merged elements back into real blocks of
> lines.
> + trace.mutter('news_merge giving up')
> return 'success', list(fakelines_to_blocks(result_lines))

This mutter seems...wrong.

 review: needsfixing

review: Needs Fixing
lp:~spiv/bzr/better-news-merge updated
4815. By Andrew Bennetts

Read section order from template file named in config.

4816. By Andrew Bennetts

Add a XXX comment for future improvement.

4817. By Andrew Bennetts

First steps towards a better NEWS file parser.

4818. By Andrew Bennetts

Rename test.

4819. By Andrew Bennetts

Merge lp:bzr.

4820. By Andrew Bennetts

Merge object-3way-merge.

4821. By Andrew Bennetts

Possibly working news_merge built on a richer structure than just lines.

4822. By Andrew Bennetts

Fix some simple bugs.

Unmerged revisions

4822. By Andrew Bennetts

Fix some simple bugs.

4821. By Andrew Bennetts

Possibly working news_merge built on a richer structure than just lines.

4820. By Andrew Bennetts

Merge object-3way-merge.

4819. By Andrew Bennetts

Merge lp:bzr.

4818. By Andrew Bennetts

Rename test.

4817. By Andrew Bennetts

First steps towards a better NEWS file parser.

4816. By Andrew Bennetts

Add a XXX comment for future improvement.

4815. By Andrew Bennetts

Read section order from template file named in config.

4814. By Andrew Bennetts

Teach news_merge to handle conflicts involving section headings as well as bullets.

4813. By Andrew Bennetts

Add some simple mutters so that it's easy to tell if news_merge has been triggered.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'NEWS'
2--- NEWS 2010-04-20 10:30:30 +0000
3+++ NEWS 2010-04-20 13:35:41 +0000
4@@ -107,6 +107,10 @@
5 less.)
6 (Martin Pool, #553017)
7
8+* The ``news_merge`` plugin is now smarter. It can resolve conflicts
9+ involving section headings as well as bullet points.
10+ (Andrew Bennetts)
11+
12 Documentation
13 *************
14
15
16=== modified file 'bzrlib/merge3.py'
17--- bzrlib/merge3.py 2009-03-23 14:59:43 +0000
18+++ bzrlib/merge3.py 2010-04-20 13:35:41 +0000
19@@ -66,10 +66,24 @@
20 Given BASE, OTHER, THIS, tries to produce a combined text
21 incorporating the changes from both BASE->OTHER and BASE->THIS.
22 All three will typically be sequences of lines."""
23- def __init__(self, base, a, b, is_cherrypick=False):
24- check_text_lines(base)
25- check_text_lines(a)
26- check_text_lines(b)
27+
28+ def __init__(self, base, a, b, is_cherrypick=False, allow_objects=False):
29+ """Constructor.
30+
31+ :param base: lines in BASE
32+ :param a: lines in A
33+ :param b: lines in B
34+ :param is_cherrypick: flag indicating if this merge is a cherrypick.
35+ When cherrypicking b => a, matches with b and base do not conflict.
36+ :param allow_objects: if True, do not require that base, a and b are
37+ plain Python strs. Also prevents BinaryFile from being raised.
38+ Lines can be any sequence of comparable and hashable Python
39+ objects.
40+ """
41+ if not allow_objects:
42+ check_text_lines(base)
43+ check_text_lines(a)
44+ check_text_lines(b)
45 self.base = base
46 self.a = a
47 self.b = b
48
49=== modified file 'bzrlib/plugins/news_merge/__init__.py'
50--- bzrlib/plugins/news_merge/__init__.py 2010-01-28 17:27:16 +0000
51+++ bzrlib/plugins/news_merge/__init__.py 2010-04-20 13:35:41 +0000
52@@ -26,10 +26,30 @@
53 The news_merge_files config option takes a list of file paths, separated by
54 commas.
55
56+The basic approach is that this plugin parses the NEWS file into a simple
57+series of versions, with sections of bullets in those versions. Sections
58+contain a sorted set of bullets, and sections within a version also have a
59+fixed order (see the template at the bottom of NEWS). The plugin merges
60+additions and deletions to the set of bullets (and sections of bullets), then
61+sorts the contents of these sets and turns them back into a series of lines of
62+text.
63+
64 Limitations:
65
66-* if there's a conflict in more than just bullet points, this doesn't yet know
67- how to resolve that, so bzr will fallback to the default line-based merge.
68+* invisible whitespace in blank lines is not tracked, so is discarded. (i.e.
69+ [newline, space, newline] is collapsed to just [newline, newline])
70+
71+* empty sections are generally deleted, even if they were present in the
72+ originals.
73+
74+* modified sections will typically be reordered to match the standard order (as
75+ shown in the template at the bottom of NEWS).
76+
77+* if there's a conflict that involves more than simple sections of bullets,
78+ this plugin doesn't know how to handle that. e.g. a conflict in preamble
79+ text describing a new version, or sufficiently many conflicts that the
80+ matcher thinks a conflict spans a version heading. bzr's builtin merge logic
81+ will be tried instead.
82 """
83
84 # Since we are a built-in plugin we share the bzrlib version
85
86=== modified file 'bzrlib/plugins/news_merge/news_merge.py'
87--- bzrlib/plugins/news_merge/news_merge.py 2010-01-28 18:05:44 +0000
88+++ bzrlib/plugins/news_merge/news_merge.py 2010-04-20 13:35:41 +0000
89@@ -16,12 +16,21 @@
90
91 """Merge logic for news_merge plugin."""
92
93-
94-from bzrlib.plugins.news_merge.parser import simple_parse
95-from bzrlib import merge, merge3
96-
97-
98-magic_marker = '|NEWS-MERGE-MAGIC-MARKER|'
99+import copy
100+
101+from bzrlib.plugins.news_merge.parser import (
102+ ContainerChunk,
103+ parse_lines_to_structure,
104+ simple_parse,
105+ )
106+from bzrlib import merge, merge3, trace
107+
108+
109+class Infinity(object):
110+ """Object that always sorts to the end of a list."""
111+
112+ def __lt__(self, other):
113+ return True
114
115
116 class NewsMerger(merge.ConfigurableFileMerger):
117@@ -29,6 +38,51 @@
118
119 name_prefix = "news"
120
121+ def __init__(self, merger):
122+ super(NewsMerger, self).__init__(merger)
123+ self.canonical_section_order = None
124+
125+ def get_section_ordering(self):
126+ if self.canonical_section_order is None:
127+ # None is a placeholder for an unseen section heading.
128+ sections = [None]
129+ try:
130+ # Read file named by ${name_prefix}_template config option, and
131+ # extract the preferred section order from that.
132+ this_tree = self.merger.this_tree
133+ config = this_tree.branch.get_config()
134+ config_key = self.name_prefix + '_template'
135+ template_path = config.get_user_option(config_key)
136+ template_file_id = this_tree.path2id(template_path)
137+ template = this_tree.get_file_text(template_file_id)
138+ for kind, text in simple_parse(template):
139+ if kind == 'section':
140+ sections.append(text.split('\n', 1)[0])
141+ except Exception:
142+ trace.mutter('could not read NEWS template')
143+ trace.log_exception_quietly()
144+ trace.mutter('news merge section order: %r', sections)
145+ self.canonical_section_order = sections
146+ return self.canonical_section_order
147+
148+ def sort_sections(self, sections):
149+ return sorted(sections, key=self.section_sort_key)
150+
151+ def sort_section_names(self, section_names):
152+ return sorted(section_names, key=self.section_name_sort_key)
153+
154+ def section_sort_key(self, section):
155+ section_name = section.text.split('\n', 1)[0]
156+ return self.section_name_sort_key(section_name)
157+
158+ def section_name_sort_key(self, section):
159+ canonical_section_order = self.get_section_ordering()
160+ try:
161+ return canonical_section_order.index(section)
162+ except ValueError:
163+ # Put unexpected sections last.
164+ return Infinity()
165+
166 def merge_text(self, params):
167 """Perform a simple 3-way merge of a bzr NEWS file.
168
169@@ -36,59 +90,178 @@
170 points, so we can simply take a set of bullet points, determine which
171 bullets to add and which to remove, sort, and reserialize.
172 """
173- # Transform the different versions of the NEWS file into a bunch of
174- # text lines where each line matches one part of the overall
175- # structure, e.g. a heading or bullet.
176- def munge(lines):
177- return list(blocks_to_fakelines(simple_parse(''.join(lines))))
178- this_lines = munge(params.this_lines)
179- other_lines = munge(params.other_lines)
180- base_lines = munge(params.base_lines)
181- m3 = merge3.Merge3(base_lines, this_lines, other_lines)
182- result_lines = []
183+ trace.mutter('news_merge triggered')
184+ this_news_file = canonicalise_news_file(parse_lines_to_structure(params.this_lines), self)
185+ other_news_file = canonicalise_news_file(parse_lines_to_structure(params.other_lines), self)
186+ base_news_file = canonicalise_news_file(parse_lines_to_structure(params.base_lines), self)
187+ m3 = merge3.Merge3(list(base_news_file.flatten()),
188+ list(this_news_file.flatten()),
189+ list(other_news_file.flatten()), allow_objects=True)
190+ result_chunks = []
191 for group in m3.merge_groups():
192 if group[0] == 'conflict':
193 _, base, a, b = group
194- # Are all the conflicting lines bullets? If so, we can merge
195- # this.
196- for line_set in [base, a, b]:
197- for line in line_set:
198- if not line.startswith('bullet'):
199- # Something else :(
200- # Maybe the default merge can cope.
201- return 'not_applicable', None
202- # Calculate additions and deletions.
203- new_in_a = set(a).difference(base)
204- new_in_b = set(b).difference(base)
205- all_new = new_in_a.union(new_in_b)
206- deleted_in_a = set(base).difference(a)
207- deleted_in_b = set(base).difference(b)
208- # Combine into the final set of bullet points.
209- final = all_new.difference(deleted_in_a).difference(
210- deleted_in_b)
211- # Sort, and emit.
212- final = sorted(final, key=sort_key)
213- result_lines.extend(final)
214+ # Are all the conflicting lines bullets or sections? If so, we
215+ # can merge this.
216+ try:
217+ base_sections = chunks_to_section_dict(base)
218+ a_sections = chunks_to_section_dict(a)
219+ b_sections = chunks_to_section_dict(b)
220+ except MergeTooHard, mth:
221+ # Something else :(
222+ # Maybe the default merge can cope.
223+ trace.mutter('news_merge giving up: %s', mth)
224+ return 'not_applicable', None
225+
226+ # Basically, for every section present in any version, call
227+ # merge_bullets (passing an empty set for versions missing
228+ # that section), and if the resulting set of bullets is not
229+ # empty, emit the section heading and the sorted set of
230+ # bullets.
231+ all_sections = set(
232+ base_sections.keys() + a_sections.keys() +
233+ b_sections.keys())
234+ sections_in_order = self.sort_section_names(all_sections)
235+ for section in sections_in_order:
236+ bullets = merge_bullets(
237+ base_sections.get(section, set()),
238+ a_sections.get(section, set()),
239+ b_sections.get(section, set()))
240+ if bullets:
241+ # Emit section heading (if any), then sorted bullets.
242+ if section is not None:
243+ result_chunks.append(
244+ ContainerChunk(
245+ 'section',
246+ section + '\n' + '*'*len(section)))
247+ final = sorted(bullets, key=sort_key)
248+ result_chunks.extend(final)
249 else:
250- result_lines.extend(group[1])
251+ result_chunks.extend(group[1])
252 # Transform the merged elements back into real blocks of lines.
253- return 'success', list(fakelines_to_blocks(result_lines))
254-
255-
256-def blocks_to_fakelines(blocks):
257- for kind, text in blocks:
258- yield '%s%s%s' % (kind, magic_marker, text)
259-
260-
261-def fakelines_to_blocks(fakelines):
262- fakelines = list(fakelines)
263- # Strip out the magic_marker, and reinstate the \n\n between blocks
264- for fakeline in fakelines[:-1]:
265- yield fakeline.split(magic_marker, 1)[1] + '\n\n'
266- # The final block doesn't have a trailing \n\n.
267- for fakeline in fakelines[-1:]:
268- yield fakeline.split(magic_marker, 1)[1]
269-
270-
271-def sort_key(s):
272- return s.replace('`', '').lower()
273+ trace.mutter('news_merge succeeded.')
274+ filename = self.merger.this_tree.id2path(params.file_id)
275+ trace.note('Merged by news_merge: %s', filename)
276+ result_lines = ''.join(chunk.text for chunk in result_chunks)
277+ return 'success', result_lines
278+
279+
280+def merge_bullets(base_bullets, a_bullets, b_bullets):
281+ # Calculate additions and deletions.
282+ new_in_a = a_bullets.difference(base_bullets)
283+ new_in_b = b_bullets.difference(base_bullets)
284+ all_new = new_in_a.union(new_in_b)
285+ deleted_in_a = base_bullets.difference(a_bullets)
286+ deleted_in_b = base_bullets.difference(b_bullets)
287+ # Combine into the final set of bullet points.
288+ final = all_new.difference(deleted_in_a).difference(deleted_in_b)
289+ return final
290+
291+
292+class MergeTooHard(Exception):
293+ pass
294+
295+
296+def chunks_to_section_dict(chunks):
297+ """Takes a sequence of chunks, and returns a dict mapping section to
298+ a set of bullets.
299+
300+ :param chunks: a sequence of chunks
301+ :raises MergeTooHard: when chunks contain anything other than sections or
302+ bullets
303+ :returns: a dict of section name -> set of bullet chunks. Any
304+ bullets encounted before a section will have a name of None.
305+ """
306+ section_name = None
307+ section_dict = {}
308+ for chunk in chunks:
309+ if chunk.kind == 'section':
310+ section_name = chunk.text.split('\n', 1)[0]
311+ elif chunk.kind == 'bullet':
312+ try:
313+ bullets = section_dict[section_name]
314+ except KeyError:
315+ bullets = section_dict[section_name] = set()
316+ bullets.add(chunk)
317+ else:
318+ raise MergeTooHard(chunk)
319+ return section_dict
320+
321+
322+def sort_key(chunk):
323+ return chunk.text.replace('`', '').lower()
324+
325+
326+def canonicalise_news_file(news_file, merger):
327+ new_chunks = []
328+ for chunk in news_file.chunks:
329+ if chunk.kind == 'release':
330+ chunk = canonicalise_release(chunk, merger)
331+ new_chunks.append(chunk)
332+ news_file = copy.copy(news_file)
333+ news_file.chunks = new_chunks
334+ return news_file
335+
336+
337+def canonicalise_release(release, merger):
338+ preamble = True
339+ new_chunks = []
340+ sections = []
341+ for chunk in release.chunks:
342+ if preamble and chunk.kind != 'section':
343+ new_chunks.append(chunk)
344+ continue
345+ elif chunk.kind == 'section':
346+ preamble = False
347+ section = canonicalise_section(chunk)
348+ sections.append(section)
349+ else:
350+ # not preamble, not section... must be trailing garbage. Blah.
351+ # XXX: should probably raise an error or something. For now just
352+ # add it to new_chunks, it'll become part of the preamble.
353+ new_chunks.append(chunk)
354+
355+ # Sort the sections by name
356+ sections = merger.sort_sections(sections)
357+ # Combine duplicated sections (which will be adjacent after the sorting)
358+ canonical_sections = []
359+ for section in sections:
360+ if canonical_sections and canonical_sections[-1].text == section.text:
361+ # Identical. Combine them.
362+ chunks = canonical_sections[-1].chunks + section.chunks
363+ section = copy.copy(section)
364+ section.chunks = chunks
365+ section = canonicalise_section(section)
366+ canonical_sections[-1] = section
367+ continue
368+ else:
369+ canonical_sections.append(section)
370+ new_chunks.extend(canonical_sections)
371+ release = copy.copy(release)
372+ release.chunks = new_chunks
373+ return release
374+
375+
376+def canonicalise_section(section):
377+ preamble = True
378+ new_chunks = []
379+ bullets = set()
380+ for chunk in section.chunks:
381+ if preamble and chunk.kind != 'bullet':
382+ new_chunks.append(chunk)
383+ continue
384+ elif chunk.kind == 'bullet':
385+ preamble = False
386+ bullets.add(chunk.text)
387+ else:
388+ # not preamble, not bullet... must be trailing garbage. Blah.
389+ # XXX: should probably raise an error or something. For now just
390+ # add it to new_chunks, it'll become part of the preamble.
391+ new_chunks.append(chunk)
392+ new_section = copy.copy(section)
393+ new_section.chunks = new_chunks
394+ bullets = sorted(bullets, key=sort_key)
395+ for bullet in bullets:
396+ new_section.add_leaf('bullet', bullet)
397+ return new_section
398+
399
400=== modified file 'bzrlib/plugins/news_merge/parser.py'
401--- bzrlib/plugins/news_merge/parser.py 2010-01-18 07:00:11 +0000
402+++ bzrlib/plugins/news_merge/parser.py 2010-04-20 13:35:41 +0000
403@@ -24,6 +24,194 @@
404 simple_parse's docstring).
405 """
406
407+# [root]
408+# - Heading
409+# - Text
410+# - Release
411+# - Text
412+# - Section
413+# - Bullet
414+# - Section
415+# - Bullet
416+# - Bullet
417+# - Release
418+# - Text
419+# - Section
420+# - Bullet
421+# - Section
422+# - Text
423+# - Bullet
424+# - Text
425+
426+class ContainerChunk(object):
427+
428+ def __init__(self, kind, text):
429+ self.chunks = []
430+ self.kind = kind
431+ self.text = text
432+
433+ def __repr__(self):
434+ if len(self.text) > 20:
435+ abbr_text = self.text[:20] + '...'
436+ else:
437+ abbr_text = self.text
438+ return '<%s kind=%s text=%s>' % (
439+ self.__class__.__name__, self.kind, repr(abbr_text))
440+
441+ def __cmp__(self, other):
442+ if not isinstance(other, ContainerChunk):
443+ return NotImplemented
444+ return cmp(
445+ (self.kind, self.text, self.chunks),
446+ (other.kind, other.text, self.chunks))
447+
448+ def __hash__(self):
449+ return hash((self.kind, self.text, tuple(self.chunks)))
450+
451+# def __eq__(self, other):
452+# return (
453+# self.kind == other.kind and
454+# self.text == other.text and
455+# self.chunks == other.chunks)
456+#
457+# def __lt__(self, other):
458+# return (
459+# self.kind < other.kind or
460+# self.text < other.text or
461+# self.chunks < other.chunks)
462+#
463+ def add_container(self, kind, text):
464+ container = ContainerChunk(kind, text)
465+ self.chunks.append(container)
466+ return container
467+
468+ def add_leaf(self, kind, text):
469+ if kind == 'blank':
470+ # Attach this blank text to the previous chunk (which might be
471+ # self), rather than tracking it as its own leaf.
472+ if self.chunks:
473+ self.chunks[-1].text += text
474+ else:
475+ self.text += text
476+ return
477+ chunk = LeafChunk(kind, text)
478+ self.chunks.append(chunk)
479+ return chunk
480+
481+ def flatten(self):
482+ yield self
483+ for chunk in self.chunks:
484+ for elem in chunk.flatten():
485+ yield elem
486+
487+ def as_text_iter(self):
488+ yield self.text
489+ for chunk in self.chunks:
490+ for elem in chunk.as_text_iter():
491+ yield elem
492+
493+ def as_text(self):
494+ return ''.join(self.as_text_iter())
495+
496+
497+class NewsFile(ContainerChunk):
498+
499+ def __init__(self):
500+ ContainerChunk.__init__(self, '(root)', '')
501+
502+
503+class LeafChunk(object):
504+
505+ def __init__(self, kind, text):
506+ self.kind = kind
507+ self.text = text
508+
509+ def __repr__(self):
510+ if len(self.text) > 20:
511+ abbr_text = self.text[:20] + '...'
512+ else:
513+ abbr_text = self.text
514+ return '<%s kind=%s text=%s>' % (
515+ self.__class__.__name__, self.kind, repr(abbr_text))
516+
517+ def __cmp__(self, other):
518+ if not isinstance(other, LeafChunk):
519+ return NotImplemented
520+ return cmp((self.kind, self.text), (other.kind, other.text))
521+
522+ def __hash__(self):
523+ return hash((self.kind, self.text))
524+#
525+# def __eq__(self, other):
526+# return (self.kind == other.kind and self.text == other.text)
527+#
528+# def __lt__(self, other):
529+# return (self.kind < other.kind or self.text < other.text)
530+#
531+ def flatten(self):
532+ yield self
533+
534+ def as_text_iter(self):
535+ yield self.text
536+
537+
538+import re
539+
540+
541+class ParseState(object):
542+ def __init__(self):
543+ #self.news_file = NewsFile()
544+ self.object_stack = []
545+
546+
547+class BadNewsFile(Exception):
548+ """The NEWS file could not be parsed."""
549+
550+
551+def parse_lines_to_structure(lines):
552+ """Same as parse_to_structure, but takes an iterable of strs rather than a
553+ single str.
554+ """
555+ return parse_to_structure(''.join(lines))
556+
557+
558+def parse_to_structure(content):
559+ news_file = NewsFile()
560+ leaf_kinds = ('bullet', 'empty', 'text', 'blank')
561+ # There's a strict hierarchy:
562+ # Headings contain releases contain sections
563+ # Releases never contain releases, etc.
564+ # (Any container may contain a leaf, though.)
565+ container_hierarchy = ['(root)', 'heading', 'release', 'section']
566+
567+ stack = [news_file]
568+ #import pdb; pdb.set_trace()
569+ for kind, text in simple_parse(content):
570+ #print kind, repr(text)
571+ if kind in leaf_kinds:
572+ stack[-1].add_leaf(kind, text)
573+ elif kind in container_hierarchy:
574+ # Pop the container stack until we find the right level to add this
575+ # chunk.
576+ new_rank = container_hierarchy.index(kind)
577+ while True:
578+ old_rank = container_hierarchy.index(stack[-1].kind)
579+ if new_rank > old_rank:
580+ break
581+ stack.pop()
582+ container = stack[-1].add_container(kind, text)
583+ stack.append(container)
584+ else:
585+ raise AssertionError('unexpected chunk kind: %r' % (kind,))
586+ return news_file
587+
588+
589+def simple_parse_lines(lines):
590+ """Same as simple_parse, but takes an iterable of strs rather than a single
591+ str.
592+ """
593+ return simple_parse(''.join(lines))
594+
595
596 def simple_parse(content):
597 """Returns blocks, where each block is a 2-tuple (kind, text).
598@@ -31,8 +219,15 @@
599 :kind: one of 'heading', 'release', 'section', 'empty' or 'text'.
600 :text: a str, including newlines.
601 """
602- blocks = content.split('\n\n')
603+ # Split on blank lines.
604+ blankline_re = '(\n *\n)'
605+ blocks = re.split(blankline_re, content)
606 for block in blocks:
607+ match = re.match(blankline_re, block)
608+ if match is not None and match.groups()[0] == block:
609+ # blank line
610+ yield 'blank', block
611+ continue
612 if block.startswith('###'):
613 # First line is ###...: Top heading
614 yield 'heading', block
615
616=== modified file 'bzrlib/plugins/news_merge/tests/__init__.py'
617--- bzrlib/plugins/news_merge/tests/__init__.py 2010-01-20 16:05:28 +0000
618+++ bzrlib/plugins/news_merge/tests/__init__.py 2010-04-20 13:35:41 +0000
619@@ -16,6 +16,7 @@
620
621 def load_tests(basic_tests, module, loader):
622 testmod_names = [
623+ 'test_parser',
624 'test_news_merge',
625 ]
626 basic_tests.addTest(loader.loadTestsFromModuleNames(
627
628=== added file 'bzrlib/plugins/news_merge/tests/test_parser.py'
629--- bzrlib/plugins/news_merge/tests/test_parser.py 1970-01-01 00:00:00 +0000
630+++ bzrlib/plugins/news_merge/tests/test_parser.py 2010-04-20 13:35:41 +0000
631@@ -0,0 +1,116 @@
632+# Copyright (C) 2010 by Canonical Ltd
633+#
634+# This program is free software; you can redistribute it and/or modify
635+# it under the terms of the GNU General Public License as published by
636+# the Free Software Foundation; either version 2 of the License, or
637+# (at your option) any later version.
638+#
639+# This program is distributed in the hope that it will be useful,
640+# but WITHOUT ANY WARRANTY; without even the implied warranty of
641+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
642+# GNU General Public License for more details.
643+#
644+# You should have received a copy of the GNU General Public License
645+# along with this program; if not, write to the Free Software
646+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
647+
648+
649+from bzrlib.tests import TestCase
650+
651+from bzrlib.plugins.news_merge import parser
652+
653+
654+# Define an example NEWS file with the following structure:
655+# [root]
656+# - Heading
657+# - Text
658+# - Release
659+# - Text
660+# - Section
661+# - Bullet
662+# - Section
663+# - Bullet
664+# - Bullet
665+# - Release
666+# - Text
667+# - Section
668+# - Bullet
669+# - Section
670+# - Text
671+# - Bullet
672+# - Text
673+
674+example_file = """\
675+####################
676+Bazaar Release Notes
677+####################
678+
679+.. contents:: List of Releases
680+ :depth: 1
681+
682+bzr x.y.z (not released yet)
683+############################
684+
685+:Codename: template
686+:x.y.z: ???
687+
688+Compatibility Breaks
689+********************
690+
691+* Bullet
692+
693+New Features
694+************
695+
696+* Bullet 1
697+
698+* Bullet 2
699+
700+Bug Fixes
701+*********
702+
703+bzr x.y.y
704+#########
705+
706+:Codename: previous
707+
708+Compatibility Breaks
709+********************
710+
711+* Bullet
712+
713+New Features
714+************
715+
716+Preamble text for section.
717+
718+* Bullet, not text.
719+
720+Footnote.
721+"""
722+
723+class TestStructuredParseSmokeTests(TestCase):
724+ """Smoke tests parse_to_structure using example_file."""
725+
726+ def test_parse(self):
727+ """example_file can be parsed without an error."""
728+ news_file = parser.parse_to_structure(example_file)
729+
730+ def test_roundtrip(self):
731+ """The NewsFile object can regenerate the original bytes."""
732+ news_file = parser.parse_to_structure(example_file)
733+ self.assertEqualDiff(example_file, news_file.as_text())
734+
735+ def test_flatten(self):
736+ """NewsFile.flatten shows the file has been interpreted as
737+ releases/sections/bullets etc.
738+ """
739+ news_file = parser.parse_to_structure(example_file)
740+ expected_kinds = ['(root)', 'heading', 'text', 'release', 'text',
741+ 'section', 'bullet', 'section', 'bullet', 'bullet', 'section',
742+ 'release', 'text', 'section', 'bullet', 'section', 'text',
743+ 'bullet', 'text']
744+ kinds = [chunk.kind for chunk in news_file.flatten()
745+ if chunk.kind != 'blank']
746+ self.assertEqual(expected_kinds, kinds)
747+