Bazaar

Merge lp:~spiv/bzr/better-news-merge into lp:bzr

better-news-merge
Merge into bzr.dev

Proposed by Andrew Bennetts on 2010-02-13

Status:

Work in progress

Proposed branch:

lp:~spiv/bzr/better-news-merge

Merge into:

lp:bzr

Diff against target:

747 lines (+587/-64)

7 files modified

NEWS (+4/-0)
bzrlib/merge3.py (+18/-4)
bzrlib/plugins/news_merge/__init__.py (+22/-2)
bzrlib/plugins/news_merge/news_merge.py (+230/-57)
bzrlib/plugins/news_merge/parser.py (+196/-1)
bzrlib/plugins/news_merge/tests/__init__.py (+1/-0)
bzrlib/plugins/news_merge/tests/test_parser.py (+116/-0)

To merge this branch:

bzr merge lp:~spiv/bzr/better-news-merge

Low

Confirmed

Link a bug report

Reviewer	Review Type	Date Requested	Status
Robert Collins (community)		2010-02-13	Needs Fixing on 2010-02-13
Review via email: mp+19247@code.launchpad.net

This proposal supersedes a proposal from 2010-02-12.

Revision history for this message

Andrew Bennetts (spiv) wrote on 2010-02-12: Posted in a previous version of this proposal

This patch makes the news_merge plugin more capable. Hopefully the changes to comments and docstrings in the patch explain the changes well enough, but in short it makes it capable of coping with conflicts that span section headings (like 'Bug Fixes'), whereas before it only dealt with conflicts between bullet points within a section. See the comments and code for the details (and limitations and tradeoffs).

With this change I expect it news_merge will handle the vast majority of NEWS merges.

A minor orthogonal change adds a couple of trivial mutters to let you a reader of ~/.bzr.log know when news_merge has been used.

Revision history for this message

Robert Collins (lifeless) wrote on 2010-02-12: Posted in a previous version of this proposal

The patch seems to have conflicts and a lot of noise :(

-Rob

Revision history for this message

Andrew Bennetts (spiv) wrote on 2010-02-13: Posted in a previous version of this proposal

Oh, I wrote the code against 2.1, but proposed against lp:bzr. Hmm.

Revision history for this message

Andrew Bennetts (spiv) wrote on 2010-02-13:

Resubmitted in hope of getting a better diff now that lp:bzr/2.1 (which this branch was based on) has been merged to lp:bzr.

Original proposal:

"""
This patch makes the news_merge plugin more capable. Hopefully the changes to comments and docstrings in the patch explain the changes well enough, but in short it makes it capable of coping with conflicts that span section headings (like 'Bug Fixes'), whereas before it only dealt with conflicts between bullet points within a section. See the comments and code for the details (and limitations and tradeoffs).

With this change I expect it news_merge will handle the vast majority of NEWS merges.

A minor orthogonal change adds a couple of trivial mutters to let you a reader of ~/.bzr.log know when news_merge has been used.
"""

Revision history for this message

Robert Collins (lifeless) wrote on 2010-02-13:

On Sat, 2010-02-13 at 01:30 +0000, Andrew Bennetts wrote:
>
> magic_marker = '|NEWS-MERGE-MAGIC-MARKER|'
>
> +# The order sections are supposed to appear in. See the template at
> the
> +# bottom of NEWS. None is a placeholder for an unseen section
> heading.
> +canonical_section_order = [
> + None, 'Compatibility Breaks', 'New Features', 'Bug Fixes',
> 'Improvements',
> + 'Documentation', 'API Changes', 'Internals', 'Testing']

This is duplicated with the template; perhaps you could use the template
instead? That would make this usable by other projects.
...
> + # Are all the conflicting lines bullets or sections?
> If so, we
> + # can merge this.
> + try:
> + base_sections =
> munged_lines_to_section_dict(base)
> + a_sections = munged_lines_to_section_dict(a)
> + b_sections = munged_lines_to_section_dict(b)
> + except MergeTooHard:
> + # Something else :(
> + # Maybe the default merge can cope.
> + trace.mutter('news_merge giving up')
> + return 'not_applicable', None

In the NEWS entry you aren't entirely clear about the implications of
'using bzr's builtin merge'. .. from the code it looks like 'if there
are conflicts outside the structured section none of the news file is
smart merged'. Perhaps we could merge just the non-section data with
bzr's built in merge, or make the NEWS entry clearer.
...
> # Transform the merged elements back into real blocks of
> lines.
> + trace.mutter('news_merge giving up')
> return 'success', list(fakelines_to_blocks(result_lines))

This mutter seems...wrong.

review: needsfixing

review: Needs Fixing

lp:~spiv/bzr/better-news-merge updated on 2010-04-20

4815. By Andrew Bennetts on 2010-03-19: Read section order from template file named in config.
4816. By Andrew Bennetts on 2010-03-19: Add a XXX comment for future improvement.
4817. By Andrew Bennetts on 2010-04-19: First steps towards a better NEWS file parser.
4818. By Andrew Bennetts on 2010-04-20: Rename test.
4819. By Andrew Bennetts on 2010-04-20: Merge lp:bzr.
4820. By Andrew Bennetts on 2010-04-20: Merge object-3way-merge.
4821. By Andrew Bennetts on 2010-04-20: Possibly working news_merge built on a richer structure than just lines.
4822. By Andrew Bennetts on 2010-04-20: Fix some simple bugs.

Unmerged revisions

4822. By Andrew Bennetts on 2010-04-20: Fix some simple bugs.
4821. By Andrew Bennetts on 2010-04-20: Possibly working news_merge built on a richer structure than just lines.
4820. By Andrew Bennetts on 2010-04-20: Merge object-3way-merge.
4819. By Andrew Bennetts on 2010-04-20: Merge lp:bzr.
4818. By Andrew Bennetts on 2010-04-20: Rename test.
4817. By Andrew Bennetts on 2010-04-19: First steps towards a better NEWS file parser.
4816. By Andrew Bennetts on 2010-03-19: Add a XXX comment for future improvement.
4815. By Andrew Bennetts on 2010-03-19: Read section order from template file named in config.
4814. By Andrew Bennetts on 2010-02-12: Teach news_merge to handle conflicts involving section headings as well as bullets.
4813. By Andrew Bennetts on 2010-02-12: Add some simple mutters so that it's easy to tell if news_merge has been triggered.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alejandro Cornejo2

Andrew Bennetts

Bazaar Codereview Subscribers

Benoit Pierre

Gmood

Karl Bielefeldt

Mahmoud Hassan

Matt Nordhoff

Mohd Fikri Mohd Amin

MrJOHN

Václav Haisman

bzr PQM

vincenzo

to status/vote changes:

Alexander Belchenko

amandla2023

Bazaar

Merge lp:~spiv/bzr/better-news-merge into lp:bzr

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers

 === modified file 'NEWS'
 --- NEWS	2010-04-20 10:30:30 +0000
 +++ NEWS	2010-04-20 13:35:41 +0000
@@ -107,6 +107,10 @@
    less.)
    (Martin Pool, #553017)
++* The ``news_merge`` plugin is now smarter.  It can resolve conflicts
++  involving section headings as well as bullet points.
++  (Andrew Bennetts)
++
  Documentation
  *************
 === modified file 'bzrlib/merge3.py'
 --- bzrlib/merge3.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/merge3.py	2010-04-20 13:35:41 +0000
@@ -66,10 +66,24 @@
      Given BASE, OTHER, THIS, tries to produce a combined text
      incorporating the changes from both BASE->OTHER and BASE->THIS.
      All three will typically be sequences of lines."""
--    def __init__(self, base, a, b, is_cherrypick=False):
--        check_text_lines(base)
--        check_text_lines(a)
--        check_text_lines(b)
++
++    def __init__(self, base, a, b, is_cherrypick=False, allow_objects=False):
++        """Constructor.
++
++        :param base: lines in BASE
++        :param a: lines in A
++        :param b: lines in B
++        :param is_cherrypick: flag indicating if this merge is a cherrypick.
++            When cherrypicking b => a, matches with b and base do not conflict.
++        :param allow_objects: if True, do not require that base, a and b are
++            plain Python strs.  Also prevents BinaryFile from being raised.
++            Lines can be any sequence of comparable and hashable Python
++            objects.
++        """
++        if not allow_objects:
++            check_text_lines(base)
++            check_text_lines(a)
++            check_text_lines(b)
          self.base = base
          self.a = a
          self.b = b
 === modified file 'bzrlib/plugins/news_merge/__init__.py'
 --- bzrlib/plugins/news_merge/__init__.py	2010-01-28 17:27:16 +0000
 +++ bzrlib/plugins/news_merge/__init__.py	2010-04-20 13:35:41 +0000
@@ -26,10 +26,30 @@
  The news_merge_files config option takes a list of file paths, separated by
  commas.
++The basic approach is that this plugin parses the NEWS file into a simple
++series of versions, with sections of bullets in those versions.  Sections
++contain a sorted set of bullets, and sections within a version also have a
++fixed order (see the template at the bottom of NEWS).  The plugin merges
++additions and deletions to the set of bullets (and sections of bullets), then
++sorts the contents of these sets and turns them back into a series of lines of
++text.
++
  Limitations:
--* if there's a conflict in more than just bullet points, this doesn't yet know
--  how to resolve that, so bzr will fallback to the default line-based merge.
++* invisible whitespace in blank lines is not tracked, so is discarded.  (i.e.
++  [newline, space, newline] is collapsed to just [newline, newline])
++
++* empty sections are generally deleted, even if they were present in the
++  originals.
++
++* modified sections will typically be reordered to match the standard order (as
++  shown in the template at the bottom of NEWS).
++
++* if there's a conflict that involves more than simple sections of bullets,
++  this plugin doesn't know how to handle that.  e.g. a conflict in preamble
++  text describing a new version, or sufficiently many conflicts that the
++  matcher thinks a conflict spans a version heading.  bzr's builtin merge logic
++  will be tried instead.
  """
  # Since we are a built-in plugin we share the bzrlib version
 === modified file 'bzrlib/plugins/news_merge/news_merge.py'
 --- bzrlib/plugins/news_merge/news_merge.py	2010-01-28 18:05:44 +0000
 +++ bzrlib/plugins/news_merge/news_merge.py	2010-04-20 13:35:41 +0000
@@ -16,12 +16,21 @@
  """Merge logic for news_merge plugin."""
--
--from bzrlib.plugins.news_merge.parser import simple_parse
--from bzrlib import merge, merge3
--
--
--magic_marker = '|NEWS-MERGE-MAGIC-MARKER|'
++import copy
++
++from bzrlib.plugins.news_merge.parser import (
++    ContainerChunk,
++    parse_lines_to_structure,
++    simple_parse,
++    )
++from bzrlib import merge, merge3, trace
++
++
++class Infinity(object):
++    """Object that always sorts to the end of a list."""
++
++    def __lt__(self, other):
++        return True
  class NewsMerger(merge.ConfigurableFileMerger):
@@ -29,6 +38,51 @@
      name_prefix = "news"
++    def __init__(self, merger):
++        super(NewsMerger, self).__init__(merger)
++        self.canonical_section_order = None
++
++    def get_section_ordering(self):
++        if self.canonical_section_order is None:
++            # None is a placeholder for an unseen section heading.
++            sections = [None]
++            try:
++                # Read file named by ${name_prefix}_template config option, and
++                # extract the preferred section order from that.
++                this_tree = self.merger.this_tree
++                config = this_tree.branch.get_config()
++                config_key = self.name_prefix + '_template'
++                template_path = config.get_user_option(config_key)
++                template_file_id = this_tree.path2id(template_path)
++                template = this_tree.get_file_text(template_file_id)
++                for kind, text in simple_parse(template):
++                    if kind == 'section':
++                        sections.append(text.split('\n', 1)[0])
++            except Exception:
++                trace.mutter('could not read NEWS template')
++                trace.log_exception_quietly()
++            trace.mutter('news merge section order: %r', sections)
++            self.canonical_section_order = sections
++        return self.canonical_section_order
++
++    def sort_sections(self, sections):
++        return sorted(sections, key=self.section_sort_key)
++
++    def sort_section_names(self, section_names):
++        return sorted(section_names, key=self.section_name_sort_key)
++
++    def section_sort_key(self, section):
++        section_name = section.text.split('\n', 1)[0]
++        return self.section_name_sort_key(section_name)
++
++    def section_name_sort_key(self, section):
++        canonical_section_order = self.get_section_ordering()
++        try:
++            return canonical_section_order.index(section)
++        except ValueError:
++            # Put unexpected sections last.
++            return Infinity()
++
      def merge_text(self, params):
          """Perform a simple 3-way merge of a bzr NEWS file.
@@ -36,59 +90,178 @@
          points, so we can simply take a set of bullet points, determine which
          bullets to add and which to remove, sort, and reserialize.
          """
--        # Transform the different versions of the NEWS file into a bunch of
--        # text lines where each line matches one part of the overall
--        # structure, e.g. a heading or bullet.
--        def munge(lines):
--            return list(blocks_to_fakelines(simple_parse(''.join(lines))))
--        this_lines = munge(params.this_lines)
--        other_lines = munge(params.other_lines)
--        base_lines = munge(params.base_lines)
--        m3 = merge3.Merge3(base_lines, this_lines, other_lines)
--        result_lines = []
++        trace.mutter('news_merge triggered')
++        this_news_file = canonicalise_news_file(parse_lines_to_structure(params.this_lines), self)
++        other_news_file = canonicalise_news_file(parse_lines_to_structure(params.other_lines), self)
++        base_news_file = canonicalise_news_file(parse_lines_to_structure(params.base_lines), self)
++        m3 = merge3.Merge3(list(base_news_file.flatten()),
++                list(this_news_file.flatten()),
++                list(other_news_file.flatten()), allow_objects=True)
++        result_chunks = []
          for group in m3.merge_groups():
              if group[0] == 'conflict':
                  _, base, a, b = group
--                # Are all the conflicting lines bullets?  If so, we can merge
--                # this.
--                for line_set in [base, a, b]:
--                    for line in line_set:
--                        if not line.startswith('bullet'):
--                            # Something else :(
--                            # Maybe the default merge can cope.
--                            return 'not_applicable', None
--                # Calculate additions and deletions.
--                new_in_a = set(a).difference(base)
--                new_in_b = set(b).difference(base)
--                all_new = new_in_a.union(new_in_b)
--                deleted_in_a = set(base).difference(a)
--                deleted_in_b = set(base).difference(b)
--                # Combine into the final set of bullet points.
--                final = all_new.difference(deleted_in_a).difference(
--                    deleted_in_b)
--                # Sort, and emit.
--                final = sorted(final, key=sort_key)
--                result_lines.extend(final)
++                # Are all the conflicting lines bullets or sections?  If so, we
++                # can merge this.
++                try:
++                    base_sections = chunks_to_section_dict(base)
++                    a_sections = chunks_to_section_dict(a)
++                    b_sections = chunks_to_section_dict(b)
++                except MergeTooHard, mth:
++                    # Something else :(
++                    # Maybe the default merge can cope.
++                    trace.mutter('news_merge giving up: %s', mth)
++                    return 'not_applicable', None
++
++                # Basically, for every section present in any version, call
++                # merge_bullets (passing an empty set for versions missing
++                # that section), and if the resulting set of bullets is not
++                # empty, emit the section heading and the sorted set of
++                # bullets.
++                all_sections = set(
++                    base_sections.keys() + a_sections.keys() +
++                    b_sections.keys())
++                sections_in_order = self.sort_section_names(all_sections)
++                for section in sections_in_order:
++                    bullets = merge_bullets(
++                        base_sections.get(section, set()),
++                        a_sections.get(section, set()),
++                        b_sections.get(section, set()))
++                    if bullets:
++                        # Emit section heading (if any), then sorted bullets.
++                        if section is not None:
++                            result_chunks.append(
++                                ContainerChunk(
++                                    'section',
++                                    section + '\n' + '*'*len(section)))
++                        final = sorted(bullets, key=sort_key)
++                        result_chunks.extend(final)
              else:
--                result_lines.extend(group[1])
++                result_chunks.extend(group[1])
          # Transform the merged elements back into real blocks of lines.
--        return 'success', list(fakelines_to_blocks(result_lines))
--
--
--def blocks_to_fakelines(blocks):
--    for kind, text in blocks:
--        yield '%s%s%s' % (kind, magic_marker, text)
--
--
--def fakelines_to_blocks(fakelines):
--    fakelines = list(fakelines)
--    # Strip out the magic_marker, and reinstate the \n\n between blocks
--    for fakeline in fakelines[:-1]:
--        yield fakeline.split(magic_marker, 1)[1] + '\n\n'
--    # The final block doesn't have a trailing \n\n.
--    for fakeline in fakelines[-1:]:
--        yield fakeline.split(magic_marker, 1)[1]
--
--
--def sort_key(s):
--    return s.replace('`', '').lower()
++        trace.mutter('news_merge succeeded.')
++        filename = self.merger.this_tree.id2path(params.file_id)
++        trace.note('Merged by news_merge: %s', filename)
++        result_lines = ''.join(chunk.text for chunk in result_chunks)
++        return 'success', result_lines
++
++
++def merge_bullets(base_bullets, a_bullets, b_bullets):
++    # Calculate additions and deletions.
++    new_in_a = a_bullets.difference(base_bullets)
++    new_in_b = b_bullets.difference(base_bullets)
++    all_new = new_in_a.union(new_in_b)
++    deleted_in_a = base_bullets.difference(a_bullets)
++    deleted_in_b = base_bullets.difference(b_bullets)
++    # Combine into the final set of bullet points.
++    final = all_new.difference(deleted_in_a).difference(deleted_in_b)
++    return final
++
++
++class MergeTooHard(Exception):
++    pass
++
++
++def chunks_to_section_dict(chunks):
++    """Takes a sequence of chunks, and returns a dict mapping section to
++    a set of bullets.
++
++    :param chunks: a sequence of chunks
++    :raises MergeTooHard: when chunks contain anything other than sections or
++        bullets
++    :returns: a dict of section name -> set of bullet chunks.  Any
++        bullets encounted before a section will have a name of None.
++    """
++    section_name = None
++    section_dict = {}
++    for chunk in chunks:
++        if chunk.kind == 'section':
++            section_name = chunk.text.split('\n', 1)[0]
++        elif chunk.kind == 'bullet':
++            try:
++                bullets = section_dict[section_name]
++            except KeyError:
++                bullets = section_dict[section_name] = set()
++            bullets.add(chunk)
++        else:
++            raise MergeTooHard(chunk)
++    return section_dict
++
++
++def sort_key(chunk):
++    return chunk.text.replace('`', '').lower()
++
++
++def canonicalise_news_file(news_file, merger):
++    new_chunks = []
++    for chunk in news_file.chunks:
++        if chunk.kind == 'release':
++            chunk = canonicalise_release(chunk, merger)
++        new_chunks.append(chunk)
++    news_file = copy.copy(news_file)
++    news_file.chunks = new_chunks
++    return news_file
++
++
++def canonicalise_release(release, merger):
++    preamble = True
++    new_chunks = []
++    sections = []
++    for chunk in release.chunks:
++        if preamble and chunk.kind != 'section':
++            new_chunks.append(chunk)
++            continue
++        elif chunk.kind == 'section':
++            preamble = False
++            section = canonicalise_section(chunk)
++            sections.append(section)
++        else:
++            # not preamble, not section... must be trailing garbage.  Blah.
++            # XXX: should probably raise an error or something.  For now just
++            # add it to new_chunks, it'll become part of the preamble.
++            new_chunks.append(chunk)
++
++    # Sort the sections by name
++    sections = merger.sort_sections(sections)
++    # Combine duplicated sections (which will be adjacent after the sorting)
++    canonical_sections = []
++    for section in sections:
++        if canonical_sections and canonical_sections[-1].text == section.text:
++            # Identical.  Combine them.
++            chunks = canonical_sections[-1].chunks + section.chunks
++            section = copy.copy(section)
++            section.chunks = chunks
++            section = canonicalise_section(section)
++            canonical_sections[-1] = section
++            continue
++        else:
++            canonical_sections.append(section)
++    new_chunks.extend(canonical_sections)
++    release = copy.copy(release)
++    release.chunks = new_chunks
++    return release
++
++
++def canonicalise_section(section):
++    preamble = True
++    new_chunks = []
++    bullets = set()
++    for chunk in section.chunks:
++        if preamble and chunk.kind != 'bullet':
++            new_chunks.append(chunk)
++            continue
++        elif chunk.kind == 'bullet':
++            preamble = False
++            bullets.add(chunk.text)
++        else:
++            # not preamble, not bullet... must be trailing garbage.  Blah.
++            # XXX: should probably raise an error or something.  For now just
++            # add it to new_chunks, it'll become part of the preamble.
++            new_chunks.append(chunk)
++    new_section = copy.copy(section)
++    new_section.chunks = new_chunks
++    bullets = sorted(bullets, key=sort_key)
++    for bullet in bullets:
++        new_section.add_leaf('bullet', bullet)
++    return new_section
++
 === modified file 'bzrlib/plugins/news_merge/parser.py'
 --- bzrlib/plugins/news_merge/parser.py	2010-01-18 07:00:11 +0000
 +++ bzrlib/plugins/news_merge/parser.py	2010-04-20 13:35:41 +0000
@@ -24,6 +24,194 @@
  simple_parse's docstring).
  """
++# [root]
++# - Heading
++# - Text
++# - Release
++#   - Text
++#   - Section
++#     - Bullet
++#   - Section
++#     - Bullet
++#     - Bullet
++# - Release
++#   - Text
++#   - Section
++#     - Bullet
++#   - Section
++#     - Text
++#     - Bullet
++# - Text
++
++class ContainerChunk(object):
++
++    def __init__(self, kind, text):
++        self.chunks = []
++        self.kind = kind
++        self.text = text
++
++    def __repr__(self):
++        if len(self.text) > 20:
++            abbr_text = self.text[:20] + '...'
++        else:
++            abbr_text = self.text
++        return '<%s kind=%s text=%s>' % (
++            self.__class__.__name__, self.kind, repr(abbr_text))
++
++    def __cmp__(self, other):
++        if not isinstance(other, ContainerChunk):
++            return NotImplemented
++        return cmp(
++            (self.kind, self.text, self.chunks),
++            (other.kind, other.text, self.chunks))
++
++    def __hash__(self):
++        return hash((self.kind, self.text, tuple(self.chunks)))
++
++#    def __eq__(self, other):
++#        return (
++#            self.kind == other.kind and
++#            self.text == other.text and
++#            self.chunks == other.chunks)
++#
++#    def __lt__(self, other):
++#        return (
++#            self.kind < other.kind or
++#            self.text < other.text or
++#            self.chunks < other.chunks)
++#
++    def add_container(self, kind, text):
++        container = ContainerChunk(kind, text)
++        self.chunks.append(container)
++        return container
++
++    def add_leaf(self, kind, text):
++        if kind == 'blank':
++            # Attach this blank text to the previous chunk (which might be
++            # self), rather than tracking it as its own leaf.
++            if self.chunks:
++                self.chunks[-1].text += text
++            else:
++                self.text += text
++            return
++        chunk = LeafChunk(kind, text)
++        self.chunks.append(chunk)
++        return chunk
++
++    def flatten(self):
++        yield self
++        for chunk in self.chunks:
++            for elem in chunk.flatten():
++                yield elem
++
++    def as_text_iter(self):
++        yield self.text
++        for chunk in self.chunks:
++            for elem in chunk.as_text_iter():
++                yield elem
++
++    def as_text(self):
++        return ''.join(self.as_text_iter())
++
++
++class NewsFile(ContainerChunk):
++
++    def __init__(self):
++        ContainerChunk.__init__(self, '(root)', '')
++
++
++class LeafChunk(object):
++
++    def __init__(self, kind, text):
++        self.kind = kind
++        self.text = text
++
++    def __repr__(self):
++        if len(self.text) > 20:
++            abbr_text = self.text[:20] + '...'
++        else:
++            abbr_text = self.text
++        return '<%s kind=%s text=%s>' % (
++            self.__class__.__name__, self.kind, repr(abbr_text))
++
++    def __cmp__(self, other):
++        if not isinstance(other, LeafChunk):
++            return NotImplemented
++        return cmp((self.kind, self.text), (other.kind, other.text))
++
++    def __hash__(self):
++        return hash((self.kind, self.text))
++#
++#    def __eq__(self, other):
++#        return (self.kind == other.kind and self.text == other.text)
++#
++#    def __lt__(self, other):
++#        return (self.kind < other.kind or self.text < other.text)
++#
++    def flatten(self):
++        yield self
++
++    def as_text_iter(self):
++        yield self.text
++
++
++import re
++
++
++class ParseState(object):
++    def __init__(self):
++        #self.news_file = NewsFile()
++        self.object_stack = []
++
++
++class BadNewsFile(Exception):
++    """The NEWS file could not be parsed."""
++
++
++def parse_lines_to_structure(lines):
++    """Same as parse_to_structure, but takes an iterable of strs rather than a
++    single str.
++    """
++    return parse_to_structure(''.join(lines))
++
++
++def parse_to_structure(content):
++    news_file = NewsFile()
++    leaf_kinds = ('bullet', 'empty', 'text', 'blank')
++    # There's a strict hierarchy:
++    #   Headings contain releases contain sections
++    # Releases never contain releases, etc.
++    # (Any container may contain a leaf, though.)
++    container_hierarchy = ['(root)', 'heading', 'release', 'section']
++
++    stack = [news_file]
++    #import pdb; pdb.set_trace()
++    for kind, text in simple_parse(content):
++        #print kind, repr(text)
++        if kind in leaf_kinds:
++            stack[-1].add_leaf(kind, text)
++        elif kind in container_hierarchy:
++            # Pop the container stack until we find the right level to add this
++            # chunk.
++            new_rank = container_hierarchy.index(kind)
++            while True:
++                old_rank = container_hierarchy.index(stack[-1].kind)
++                if new_rank > old_rank:
++                    break
++                stack.pop()
++            container = stack[-1].add_container(kind, text)
++            stack.append(container)
++        else:
++            raise AssertionError('unexpected chunk kind: %r' % (kind,))
++    return news_file
++
++
++def simple_parse_lines(lines):
++    """Same as simple_parse, but takes an iterable of strs rather than a single
++    str.
++    """
++    return simple_parse(''.join(lines))
++
  def simple_parse(content):
      """Returns blocks, where each block is a 2-tuple (kind, text).
@@ -31,8 +219,15 @@
      :kind: one of 'heading', 'release', 'section', 'empty' or 'text'.
      :text: a str, including newlines.
      """
--    blocks = content.split('\n\n')
++    # Split on blank lines.
++    blankline_re = '(\n *\n)'
++    blocks = re.split(blankline_re, content)
      for block in blocks:
++        match = re.match(blankline_re, block)
++        if match is not None and match.groups()[0] == block:
++            # blank line
++            yield 'blank', block
++            continue
          if block.startswith('###'):
              # First line is ###...: Top heading
              yield 'heading', block
 === modified file 'bzrlib/plugins/news_merge/tests/__init__.py'
 --- bzrlib/plugins/news_merge/tests/__init__.py	2010-01-20 16:05:28 +0000
 +++ bzrlib/plugins/news_merge/tests/__init__.py	2010-04-20 13:35:41 +0000
@@ -16,6 +16,7 @@
  def load_tests(basic_tests, module, loader):
      testmod_names = [
++        'test_parser',
          'test_news_merge',
+         ]
      basic_tests.addTest(loader.loadTestsFromModuleNames(
 === added file 'bzrlib/plugins/news_merge/tests/test_parser.py'
 --- bzrlib/plugins/news_merge/tests/test_parser.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/plugins/news_merge/tests/test_parser.py	2010-04-20 13:35:41 +0000
@@ -0,0 +1,116 @@
++# Copyright (C) 2010 by Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++
++from bzrlib.tests import TestCase
++
++from bzrlib.plugins.news_merge import parser
++
++
++# Define an example NEWS file with the following structure:
++# [root]
++# - Heading
++# - Text
++# - Release
++#   - Text
++#   - Section
++#     - Bullet
++#   - Section
++#     - Bullet
++#     - Bullet
++# - Release
++#   - Text
++#   - Section
++#     - Bullet
++#   - Section
++#     - Text
++#     - Bullet
++# - Text
++
++example_file = """\
++####################
++Bazaar Release Notes
++####################
++
++.. contents:: List of Releases
++   :depth: 1
++
++bzr x.y.z (not released yet)
++############################
++
++:Codename: template
++:x.y.z: ???
++
++Compatibility Breaks
++********************
++
++* Bullet
++
++New Features
++************
++
++* Bullet 1
++
++* Bullet 2
++
++Bug Fixes
++*********
++
++bzr x.y.y
++#########
++
++:Codename: previous
++
++Compatibility Breaks
++********************
++
++* Bullet
++
++New Features
++************
++
++Preamble text for section.
++
++* Bullet, not text.
++
++Footnote.
++"""
++
++class TestStructuredParseSmokeTests(TestCase):
++    """Smoke tests parse_to_structure using example_file."""
++
++    def test_parse(self):
++        """example_file can be parsed without an error."""
++        news_file = parser.parse_to_structure(example_file)
++
++    def test_roundtrip(self):
++        """The NewsFile object can regenerate the original bytes."""
++        news_file = parser.parse_to_structure(example_file)
++        self.assertEqualDiff(example_file, news_file.as_text())
++
++    def test_flatten(self):
++        """NewsFile.flatten shows the file has been interpreted as
++        releases/sections/bullets etc.
++        """
++        news_file = parser.parse_to_structure(example_file)
++        expected_kinds = ['(root)', 'heading', 'text', 'release', 'text',
++            'section', 'bullet', 'section', 'bullet', 'bullet', 'section',
++            'release', 'text', 'section', 'bullet', 'section', 'text',
++            'bullet', 'text']
++        kinds = [chunk.kind for chunk in news_file.flatten()
++                 if chunk.kind != 'blank']
++        self.assertEqual(expected_kinds, kinds)
++