Bazaar

Merge lp:~ian-clatworthy/bzr/eol-st-ci-fix into lp:~bzr/bzr/trunk-old

eol-st-ci-fix
Merge into trunk-old

Proposed by Ian Clatworthy on 2009-06-10

Status:

Merged

Merged at revision:

not available

Proposed branch:

lp:~ian-clatworthy/bzr/eol-st-ci-fix

Merge into:

lp:~bzr/bzr/trunk-old

Diff against target:

555 lines

To merge this branch:

bzr merge lp:~ian-clatworthy/bzr/eol-st-ci-fix

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
John A Meinel		2009-06-10	Approve on 2009-06-10
Review via email: mp+7269@code.launchpad.net

Revision history for this message

Ian Clatworthy (ian-clatworthy) wrote on 2009-06-10:

This patch fixes some very bad bugs in content filtering. There are actually 2 separate issues fixed:

1. Branching from a non-content-filtered tree to a content-filtered one would produce an incorrect dirstate because the optimisation to reuse the source dirstate wasn't being disabled

2. Status detection was relying on size matching to determine equivalence when that assumption doesn't hold in the presence of content filtering.

Both code changes are quite straightforward, though the latter one requires fixes in both Python and Pyrex code. Testing all of this is far more complex though. I've added a heap more units tests and I'm comfortable that status on content filtered trees now has pretty good coverage.

I'm yet to add detailed tests for the branch/commit issue, because the monkey-patching approach used to test status isn't at the right level to work for branch. I have tested it manually though and one of the bug reporters (Frits) has confirmed that the code fixes works correctly for him. Given the related code change - switching off an optional optimisation - is very straight forward, I think this patch can land as is for 1.16rc1.

FWIW, my plan is to add those additional tests soon. (Some pending changes to the monkey-patching used here to fix some other bugs will make that easier, but they aren't ready to go, and this really needs to be landed for content filtering to be unbroken for users.)

Revision history for this message

John A Meinel (jameinel) wrote on 2009-06-10:

Download full text (4.9 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Ian Clatworthy has proposed merging lp:~ian-clatworthy/bzr/eol-st-ci-fix into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This patch fixes some very bad bugs in content filtering. There are actually 2 separate issues fixed:
>
> 1. Branching from a non-content-filtered tree to a content-filtered one would produce an incorrect dirstate because the optimisation to reuse the source dirstate wasn't being disabled
>
> 2. Status detection was relying on size matching to determine equivalence when that assumption doesn't hold in the presence of content filtering.
>
> Both code changes are quite straightforward, though the latter one requires fixes in both Python and Pyrex code. Testing all of this is far more complex though. I've added a heap more units tests and I'm comfortable that status on content filtered trees now has pretty good coverage.
>
> I'm yet to add detailed tests for the branch/commit issue, because the monkey-patching approach used to test status isn't at the right level to work for branch. I have tested it manually though and one of the bug reporters (Frits) has confirmed that the code fixes works correctly for him. Given the related code change - switching off an optional optimisation - is very straight forward, I think this patch can land as is for 1.16rc1.
>
> FWIW, my plan is to add those additional tests soon. (Some pending changes to the monkey-patching used here to fix some other bugs will make that easier, but they aren't ready to go, and this really needs to be landed for content filtering to be unbroken for users.)
>

+ # Check the sha. We can't just rely on the size as
+ # content filtering may mean differ sizes actually
+ # map to the same content
+ if link_or_sha1 is None:

^- It would be nice if this was only done when there were actually
filters in play, rather than disabling the size check universally.

This change will cause us to sha1 the content of everything we are
committing 2 times. Once during iter_changes to make sure it really has
changed, and then once during the insert-into-repository code to record
its final sha, and ensure that it really has changed.

Note that ideally we would have an iter_changes that could provide
"maybe changed" and *never* sha content. Perhaps we can address that
when we re-visit dirstate. (So I'm saying it is probably ok for now, but
it is not the way we want to leave the code in the long run.)

...

                 # Note: do NOT move this logic up higher - using the
basis from
                 # the accelerator tree is still desirable because that
can save
                 # a minute or more of processing on large trees!
+ # The original tree may not have the same content filters
+ # applied so we can't safely build the inventory delta from
+ # the source tree.
                 if wt.supports_content_filtering():
                     accelerator_tree = None
+ delta_from_tree = False
+ else:
+ delta...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Ian Clatworthy has proposed merging lp:~ian-clatworthy/bzr/eol-st-ci-fix into lp:bzr.
> 
> Requested reviews:
>     bzr-core (bzr-core)
> 
> This patch fixes some very bad bugs in content filtering. There are actually 2 separate issues fixed:
> 
> 1. Branching from a non-content-filtered tree to a content-filtered one would produce an incorrect dirstate because the optimisation to reuse the source dirstate wasn't being disabled
> 
> 2. Status detection was relying on size matching to determine equivalence when that assumption doesn't hold in the presence of content filtering.
> 
> Both code changes are quite straightforward, though the latter one requires fixes in both Python and Pyrex code. Testing all of this is far more complex though. I've added a heap more units tests and I'm comfortable that status on content filtered trees now has pretty good coverage.
> 
> I'm yet to add detailed tests for the branch/commit issue, because the monkey-patching approach used to test status isn't at the right level to work for branch. I have tested it manually though and one of the bug reporters (Frits) has confirmed that the code fixes works correctly for him. Given the related code change - switching off an optional optimisation - is very straight forward, I think this patch can land as is for 1.16rc1.
> 
> FWIW, my plan is to add those additional tests soon. (Some pending changes to the monkey-patching used here to fix some other bugs will make that easier, but they aren't ready to go, and this really needs to be landed for content filtering to be unbroken for users.)
>

+                        # Check the sha. We can't just rely on the size as
+                        # content filtering may mean differ sizes actually
+                        # map to the same content
+                        if link_or_sha1 is None:

^- It would be nice if this was only done when there were actually
filters in play, rather than disabling the size check universally.

...

# Note: do NOT move this logic up higher - using the
basis from
                 # the accelerator tree is still desirable because that
can save
                 # a minute or more of processing on large trees!
+                # The original tree may not have the same content filters
+                # applied so we can't safely build the inventory delta from
+                # the source tree.
                 if wt.supports_content_filtering():
                     accelerator_tree = None
+                    delta_from_tree = False
+                else:
+                    delta_from_tree = True

^- As long as you are getting into 'correctness', if the
*accelerator_tree* supports content filtering you get bad results, too.
(Branching from a filtered-tree into a non-filtered tree.)

What I don't understand is why "delta_from_tree" is no longer safe.

Anyway, the proper change would be in 'cmd_branch':

accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(
            from_location)
+       if accelerator_tree is not None and
+           accelerator_tree.supports_filtering:
+           accelerator_tree = None

I don't really see why adding a blackbox test that branching between
places with different rules doesn't continue to give correct results.
Assuming rules can be given with absolute paths, etc. You would do
something like:

def test_branch_source_is_filtered(self):
    source = self.make_branch_and_tree('source')
    rules = get_rules()
    rules[source.basedir + '/foo.txt'] = XXXX
    self.build_tree(['source/foo.txt'])
    source.add()
    source.commit()
    self.run_bzr('branch source target')
    target = workingtree.WorkingTree.open('target')
    # Even though the content has different filters, iter_changes is
    # clean
    self.assertFileEqual("unfiltered content\n", 'target/foo.txt')
    changes = wt.changes_from(wt.basis_tree())
    self.assertFalse(changes.has_changes())

And then also add a

def test_branch_target_is_filtered(self):

So I'd like to see the new tests before it lands, but otherwise

review approve

(Consider this "tweak")

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkovwH4ACgkQJdeBCYSNAAMFTwCfTesTQECsm77/KMxiS9FtJJUe
AXkAn3J8VBOnMqk0W4FJA+ogtPN3SYeF
=c8hE
-----END PGP SIGNATURE-----

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Aaron Bentley

Denys Duchier

Eric Siegerman

Gary van der Merwe

Ian Clatworthy

Jelmer Vernooij

John Szakmeister

Jonathan Lange

Marius Kruger

Martin Albisetti

Matt Nordhoff

Paul Hummer

SuperMMX

Talden

Yoshinori Sano

to status/vote changes:

Alexander Belchenko

Martin Eisenhardt

Tim Penhey

Vincent Ladeuil

Bazaar

Merge lp:~ian-clatworthy/bzr/eol-st-ci-fix into lp:~bzr/bzr/trunk-old

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'NEWS'
 --- NEWS	2009-06-11 00:47:41 +0000
 +++ NEWS	2009-06-11 04:35:14 +0000
@@ -81,6 +81,9 @@
    the fetched revisions, not in the stacked-on ancestry.
    (John Arbash Meinel)
++* Fix status and commit to work with content filtered trees, addressing
++  numerous bad bugs with line-ending support. (Ian Clatworthy, #362030)
++
  * Fix problem of "directory not empty" when contending for a lock over
    sftp.  (Martin Pool, #340352)
 === modified file 'bzrlib/_dirstate_helpers_c.pyx'
 --- bzrlib/_dirstate_helpers_c.pyx	2009-03-23 14:59:43 +0000
 +++ bzrlib/_dirstate_helpers_c.pyx	2009-06-11 04:35:14 +0000
@@ -1140,19 +1140,17 @@
                      if source_minikind != c'f':
                          content_change = 1
                      else:
--                        # If the size is the same, check the sha:
--                        if target_details[2] == source_details[2]:
--                            if link_or_sha1 is None:
--                                # Stat cache miss:
--                                statvalue, link_or_sha1 = \
--                                    self.state._sha1_provider.stat_and_sha1(
--                                    path_info[4])
--                                self.state._observed_sha1(entry, link_or_sha1,
--                                    statvalue)
--                            content_change = (link_or_sha1 != source_details[1])
--                        else:
--                            # Size changed, so must be different
--                            content_change = 1
++                        # Check the sha. We can't just rely on the size as
++                        # content filtering may mean differ sizes actually
++                        # map to the same content
++                        if link_or_sha1 is None:
++                            # Stat cache miss:
++                            statvalue, link_or_sha1 = \
++                                self.state._sha1_provider.stat_and_sha1(
++                                path_info[4])
++                            self.state._observed_sha1(entry, link_or_sha1,
++                                statvalue)
++                        content_change = (link_or_sha1 != source_details[1])
                      # Target details is updated at update_entry time
                      if self.use_filesystem_for_exec:
                          # We don't need S_ISREG here, because we are sure
 === modified file 'bzrlib/builtins.py'
 --- bzrlib/builtins.py	2009-06-10 03:56:49 +0000
 +++ bzrlib/builtins.py	2009-06-11 04:35:14 +0000
@@ -1112,6 +1112,9 @@
          accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(
              from_location)
++        if (accelerator_tree is not None and
++            accelerator_tree.supports_content_filtering()):
++            accelerator_tree = None
          revision = _get_one_revision('branch', revision)
          br_from.lock_read()
          try:
 === modified file 'bzrlib/dirstate.py'
 --- bzrlib/dirstate.py	2009-05-06 05:36:28 +0000
 +++ bzrlib/dirstate.py	2009-06-11 04:35:14 +0000
@@ -3062,19 +3062,17 @@
                      if source_minikind != 'f':
                          content_change = True
                      else:
--                        # If the size is the same, check the sha:
--                        if target_details[2] == source_details[2]:
--                            if link_or_sha1 is None:
--                                # Stat cache miss:
--                                statvalue, link_or_sha1 = \
--                                    self.state._sha1_provider.stat_and_sha1(
--                                    path_info[4])
--                                self.state._observed_sha1(entry, link_or_sha1,
--                                    statvalue)
--                            content_change = (link_or_sha1 != source_details[1])
--                        else:
--                            # Size changed, so must be different
--                            content_change = True
++                        # Check the sha. We can't just rely on the size as
++                        # content filtering may mean differ sizes actually
++                        # map to the same content
++                        if link_or_sha1 is None:
++                            # Stat cache miss:
++                            statvalue, link_or_sha1 = \
++                                self.state._sha1_provider.stat_and_sha1(
++                                path_info[4])
++                            self.state._observed_sha1(entry, link_or_sha1,
++                                statvalue)
++                        content_change = (link_or_sha1 != source_details[1])
                      # Target details is updated at update_entry time
                      if self.use_filesystem_for_exec:
                          # We don't need S_ISREG here, because we are sure
 === modified file 'bzrlib/tests/workingtree_implementations/test_content_filters.py'
 --- bzrlib/tests/workingtree_implementations/test_content_filters.py	2009-03-23 14:59:43 +0000
 +++ bzrlib/tests/workingtree_implementations/test_content_filters.py	2009-06-11 04:35:14 +0000
@@ -17,6 +17,7 @@
  """Tests for content filtering conformance"""
  from bzrlib.filters import ContentFilter
++from bzrlib.workingtree import WorkingTree
  from bzrlib.tests.workingtree_implementations import TestCaseWithWorkingTree
@@ -44,8 +45,8 @@
  class TestWorkingTreeWithContentFilters(TestCaseWithWorkingTree):
--    def create_cf_tree(self, txt_reader, txt_writer):
--        tree = self.make_branch_and_tree('.')
++    def create_cf_tree(self, txt_reader, txt_writer, dir='.'):
++        tree = self.make_branch_and_tree(dir)
          def _content_filter_stack(path=None, file_id=None):
              if path.endswith('.txt'):
                  return [ContentFilter(txt_reader, txt_writer)]
@@ -53,8 +54,8 @@
                  return []
          tree._content_filter_stack = _content_filter_stack
          self.build_tree_contents([
--            ('file1.txt', 'Foo Txt'),
--            ('file2.bin', 'Foo Bin')])
++            (dir + '/file1.txt', 'Foo Txt'),
++            (dir + '/file2.bin', 'Foo Bin')])
          tree.add(['file1.txt', 'file2.bin'])
          tree.commit('commit raw content')
          txt_fileid = tree.path2id('file1.txt')
@@ -104,3 +105,57 @@
              filtered=False).read())
          self.assertEqual('Foo Bin', tree.get_file(bin_fileid,
              filtered=False).read())
++
++    def test_branch_source_filtered_target_not(self):
++        # Create a source branch with content filtering
++        source, txt_fileid, bin_fileid = self.create_cf_tree(
++            txt_reader=_uppercase, txt_writer=_lowercase, dir='source')
++        if not source.supports_content_filtering():
++            return
++        self.assertFileEqual("Foo Txt", 'source/file1.txt')
++        basis = source.basis_tree()
++        basis.lock_read()
++        self.addCleanup(basis.unlock)
++        self.assertEqual("FOO TXT", basis.get_file_text(txt_fileid))
++
++        # Now branch it
++        self.run_bzr('branch source target')
++        target = WorkingTree.open('target')
++        # Even though the content in source and target are different
++        # due to different filters, iter_changes should be clean
++        self.assertFileEqual("FOO TXT", 'target/file1.txt')
++        changes = target.changes_from(source.basis_tree())
++        self.assertFalse(changes.has_changed())
++
++    def test_branch_source_not_filtered_target_is(self):
++        # Create a source branch with content filtering
++        source, txt_fileid, bin_fileid = self.create_cf_tree(
++            txt_reader=None, txt_writer=None, dir='source')
++        if not source.supports_content_filtering():
++            return
++        self.assertFileEqual("Foo Txt", 'source/file1.txt')
++        basis = source.basis_tree()
++        basis.lock_read()
++        self.addCleanup(basis.unlock)
++        self.assertEqual("Foo Txt", basis.get_file_text(txt_fileid))
++
++        # Patch in a custom, symmetric content filter stack
++        self.real_content_filter_stack = WorkingTree._content_filter_stack
++        def restore_real_content_filter_stack():
++            WorkingTree._content_filter_stack = self.real_content_filter_stack
++        self.addCleanup(restore_real_content_filter_stack)
++        def _content_filter_stack(tree, path=None, file_id=None):
++            if path.endswith('.txt'):
++                return [ContentFilter(_swapcase, _swapcase)]
++            else:
++                return []
++        WorkingTree._content_filter_stack = _content_filter_stack
++
++        # Now branch it
++        self.run_bzr('branch source target')
++        target = WorkingTree.open('target')
++        # Even though the content in source and target are different
++        # due to different filters, iter_changes should be clean
++        self.assertFileEqual("fOO tXT", 'target/file1.txt')
++        changes = target.changes_from(source.basis_tree())
++        self.assertFalse(changes.has_changed())
 === modified file 'bzrlib/tests/workingtree_implementations/test_eol_conversion.py'
 --- bzrlib/tests/workingtree_implementations/test_eol_conversion.py	2009-04-02 04:12:11 +0000
 +++ bzrlib/tests/workingtree_implementations/test_eol_conversion.py	2009-06-11 04:35:14 +0000
@@ -17,8 +17,9 @@
  """Tests for eol conversion."""
  import sys
++from cStringIO import StringIO
--from bzrlib import rules
++from bzrlib import rules, status
  from bzrlib.tests import TestSkipped
  from bzrlib.tests.workingtree_implementations import TestCaseWithWorkingTree
  from bzrlib.workingtree import WorkingTree
@@ -29,6 +30,13 @@
  _sample_text_on_win  = """hello\r\nworld\r\n"""
  _sample_text_on_unix = """hello\nworld\n"""
  _sample_binary       = """hello\nworld\r\n\x00"""
++_sample_clean_lf     = _sample_text_on_unix
++_sample_clean_crlf   = _sample_text_on_win
++
++
++# Lists of formats for each storage policy
++_LF_IN_REPO = ['native', 'lf', 'crlf']
++_CRLF_IN_REPO = [ '%s-with-crlf-in-repo' % (f,) for f in _LF_IN_REPO]
  class TestEolConversion(TestCaseWithWorkingTree):
@@ -74,8 +82,11 @@
          return t, basis
      def assertNewContentForSetting(self, wt, eol, expected_unix,
--        expected_win=None):
--        """Clone a working tree and check the convenience content."""
++        expected_win, roundtrip):
++        """Clone a working tree and check the convenience content.
++
++        If roundtrip is True, status and commit should see no changes.
++        """
          if expected_win is None:
              expected_win = expected_unix
          self.patch_rules_searcher(eol)
@@ -86,100 +97,241 @@
              self.assertEqual(expected_win, content)
          else:
              self.assertEqual(expected_unix, content)
++        # Confirm that status thinks nothing has changed if the text roundtrips
++        if roundtrip:
++            status_io = StringIO()
++            status.show_tree_status(wt2, to_file=status_io)
++            self.assertEqual('', status_io.getvalue())
      def assertContent(self, wt, basis, expected_raw, expected_unix,
--        expected_win):
--        """Check the committed content and content in cloned trees."""
++        expected_win, roundtrip_to=None):
++        """Check the committed content and content in cloned trees.
++
++        :param roundtrip_to: the set of formats (excluding exact) we
++          can round-trip to or None for all
++        """
          basis_content = basis.get_file('file1-id').read()
          self.assertEqual(expected_raw, basis_content)
--        self.assertNewContentForSetting(wt, None, expected_raw)
++
++        # No setting and exact should always roundtrip
++        self.assertNewContentForSetting(wt, None,
++            expected_raw, expected_raw, roundtrip=True)
++        self.assertNewContentForSetting(wt, 'exact',
++            expected_raw, expected_raw, roundtrip=True)
++
++        # Roundtripping is otherwise dependent on whether the original
++        # text is clean - mixed line endings will prevent it. It also
++        # depends on whether the format in the repository is being changed.
++        if roundtrip_to is None:
++            roundtrip_to = _LF_IN_REPO + _CRLF_IN_REPO
          self.assertNewContentForSetting(wt, 'native',
--            expected_unix, expected_win)
++            expected_unix, expected_win, 'native' in roundtrip_to)
          self.assertNewContentForSetting(wt, 'lf',
--            expected_unix, expected_unix)
++            expected_unix, expected_unix, 'lf' in roundtrip_to)
          self.assertNewContentForSetting(wt, 'crlf',
--            expected_win, expected_win)
++            expected_win, expected_win, 'crlf' in roundtrip_to)
          self.assertNewContentForSetting(wt, 'native-with-crlf-in-repo',
--            expected_unix, expected_win)
++            expected_unix, expected_win,
++            'native-with-crlf-in-repo' in roundtrip_to)
          self.assertNewContentForSetting(wt, 'lf-with-crlf-in-repo',
--            expected_unix, expected_unix)
++            expected_unix, expected_unix,
++            'lf-with-crlf-in-repo' in roundtrip_to)
          self.assertNewContentForSetting(wt, 'crlf-with-crlf-in-repo',
--            expected_win, expected_win)
--        self.assertNewContentForSetting(wt, 'exact', expected_raw)
--
--    def test_eol_no_rules(self):
--        wt, basis = self.prepare_tree(_sample_text)
--        self.assertContent(wt, basis, _sample_text,
--            _sample_text_on_unix, _sample_text_on_win)
--
--    def test_eol_native(self):
--        wt, basis = self.prepare_tree(_sample_text, eol='native')
--        self.assertContent(wt, basis, _sample_text_on_unix,
--            _sample_text_on_unix, _sample_text_on_win)
++            expected_win, expected_win,
++            'crlf-with-crlf-in-repo' in roundtrip_to)
++
++    # Test binary files. These always roundtrip.
++
++    def test_eol_no_rules_binary(self):
++        wt, basis = self.prepare_tree(_sample_binary)
++        self.assertContent(wt, basis, _sample_binary, _sample_binary,
++            _sample_binary)
++
++    def test_eol_exact_binary(self):
++        wt, basis = self.prepare_tree(_sample_binary, eol='exact')
++        self.assertContent(wt, basis, _sample_binary, _sample_binary,
++            _sample_binary)
      def test_eol_native_binary(self):
          wt, basis = self.prepare_tree(_sample_binary, eol='native')
          self.assertContent(wt, basis, _sample_binary, _sample_binary,
              _sample_binary)
--    def test_eol_lf(self):
--        wt, basis = self.prepare_tree(_sample_text, eol='lf')
--        self.assertContent(wt, basis, _sample_text_on_unix,
--            _sample_text_on_unix, _sample_text_on_win)
--
      def test_eol_lf_binary(self):
          wt, basis = self.prepare_tree(_sample_binary, eol='lf')
          self.assertContent(wt, basis, _sample_binary, _sample_binary,
              _sample_binary)
--    def test_eol_crlf(self):
--        wt, basis = self.prepare_tree(_sample_text, eol='crlf')
--        self.assertContent(wt, basis, _sample_text_on_unix,
--            _sample_text_on_unix, _sample_text_on_win)
--
      def test_eol_crlf_binary(self):
          wt, basis = self.prepare_tree(_sample_binary, eol='crlf')
          self.assertContent(wt, basis, _sample_binary, _sample_binary,
              _sample_binary)
--    def test_eol_native_with_crlf_in_repo(self):
--        wt, basis = self.prepare_tree(_sample_text,
--            eol='native-with-crlf-in-repo')
--        self.assertContent(wt, basis, _sample_text_on_win,
--            _sample_text_on_unix, _sample_text_on_win)
--
      def test_eol_native_with_crlf_in_repo_binary(self):
          wt, basis = self.prepare_tree(_sample_binary,
              eol='native-with-crlf-in-repo')
          self.assertContent(wt, basis, _sample_binary, _sample_binary,
              _sample_binary)
--    def test_eol_lf_with_crlf_in_repo(self):
--        wt, basis = self.prepare_tree(_sample_text, eol='lf-with-crlf-in-repo')
--        self.assertContent(wt, basis, _sample_text_on_win,
--            _sample_text_on_unix, _sample_text_on_win)
--
      def test_eol_lf_with_crlf_in_repo_binary(self):
--        wt, basis = self.prepare_tree(_sample_binary, eol='lf-with-crlf-in-repo')
++        wt, basis = self.prepare_tree(_sample_binary,
++            eol='lf-with-crlf-in-repo')
          self.assertContent(wt, basis, _sample_binary, _sample_binary,
              _sample_binary)
--    def test_eol_crlf_with_crlf_in_repo(self):
--        wt, basis = self.prepare_tree(_sample_text, eol='crlf-with-crlf-in-repo')
--        self.assertContent(wt, basis, _sample_text_on_win,
--            _sample_text_on_unix, _sample_text_on_win)
--
      def test_eol_crlf_with_crlf_in_repo_binary(self):
--        wt, basis = self.prepare_tree(_sample_binary, eol='crlf-with-crlf-in-repo')
++        wt, basis = self.prepare_tree(_sample_binary,
++            eol='crlf-with-crlf-in-repo')
          self.assertContent(wt, basis, _sample_binary, _sample_binary,
              _sample_binary)
--    def test_eol_exact(self):
++    # Test text with mixed line endings ("dirty text").
++    # This doesn't roundtrip so status always thinks something has changed.
++
++    def test_eol_no_rules_dirty(self):
++        wt, basis = self.prepare_tree(_sample_text)
++        self.assertContent(wt, basis, _sample_text,
++            _sample_text_on_unix, _sample_text_on_win, roundtrip_to=[])
++
++    def test_eol_exact_dirty(self):
          wt, basis = self.prepare_tree(_sample_text, eol='exact')
          self.assertContent(wt, basis, _sample_text,
--            _sample_text_on_unix, _sample_text_on_win)
--
--    def test_eol_exact_binary(self):
--        wt, basis = self.prepare_tree(_sample_binary, eol='exact')
--        self.assertContent(wt, basis, _sample_binary, _sample_binary,
--            _sample_binary)
++            _sample_text_on_unix, _sample_text_on_win, roundtrip_to=[])
++
++    def test_eol_native_dirty(self):
++        wt, basis = self.prepare_tree(_sample_text, eol='native')
++        self.assertContent(wt, basis, _sample_text_on_unix,
++            _sample_text_on_unix, _sample_text_on_win, roundtrip_to=[])
++
++    def test_eol_lf_dirty(self):
++        wt, basis = self.prepare_tree(_sample_text, eol='lf')
++        self.assertContent(wt, basis, _sample_text_on_unix,
++            _sample_text_on_unix, _sample_text_on_win, roundtrip_to=[])
++
++    def test_eol_crlf_dirty(self):
++        wt, basis = self.prepare_tree(_sample_text, eol='crlf')
++        self.assertContent(wt, basis, _sample_text_on_unix,
++            _sample_text_on_unix, _sample_text_on_win, roundtrip_to=[])
++
++    def test_eol_native_with_crlf_in_repo_dirty(self):
++        wt, basis = self.prepare_tree(_sample_text,
++            eol='native-with-crlf-in-repo')
++        self.assertContent(wt, basis, _sample_text_on_win,
++            _sample_text_on_unix, _sample_text_on_win, roundtrip_to=[])
++
++    def test_eol_lf_with_crlf_in_repo_dirty(self):
++        wt, basis = self.prepare_tree(_sample_text,
++            eol='lf-with-crlf-in-repo')
++        self.assertContent(wt, basis, _sample_text_on_win,
++            _sample_text_on_unix, _sample_text_on_win, roundtrip_to=[])
++
++    def test_eol_crlf_with_crlf_in_repo_dirty(self):
++        wt, basis = self.prepare_tree(_sample_text,
++            eol='crlf-with-crlf-in-repo')
++        self.assertContent(wt, basis, _sample_text_on_win,
++            _sample_text_on_unix, _sample_text_on_win, roundtrip_to=[])
++
++    # Test text with clean line endings, either always lf or always crlf.
++    # This selectively roundtrips (based on what's stored in the repo).
++
++    def test_eol_no_rules_clean_lf(self):
++        wt, basis = self.prepare_tree(_sample_clean_lf)
++        self.assertContent(wt, basis, _sample_clean_lf,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_LF_IN_REPO)
++
++    def test_eol_no_rules_clean_crlf(self):
++        wt, basis = self.prepare_tree(_sample_clean_crlf)
++        self.assertContent(wt, basis, _sample_clean_crlf,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_CRLF_IN_REPO)
++
++    def test_eol_exact_clean_lf(self):
++        wt, basis = self.prepare_tree(_sample_clean_lf, eol='exact')
++        self.assertContent(wt, basis, _sample_clean_lf,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_LF_IN_REPO)
++
++    def test_eol_exact_clean_crlf(self):
++        wt, basis = self.prepare_tree(_sample_clean_crlf, eol='exact')
++        self.assertContent(wt, basis, _sample_clean_crlf,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_CRLF_IN_REPO)
++
++    def test_eol_native_clean_lf(self):
++        wt, basis = self.prepare_tree(_sample_clean_lf, eol='native')
++        self.assertContent(wt, basis, _sample_text_on_unix,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_LF_IN_REPO)
++
++    def test_eol_native_clean_crlf(self):
++        wt, basis = self.prepare_tree(_sample_clean_crlf, eol='native')
++        self.assertContent(wt, basis, _sample_text_on_unix,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_LF_IN_REPO)
++
++    def test_eol_lf_clean_lf(self):
++        wt, basis = self.prepare_tree(_sample_clean_lf, eol='lf')
++        self.assertContent(wt, basis, _sample_text_on_unix,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_LF_IN_REPO)
++
++    def test_eol_lf_clean_crlf(self):
++        wt, basis = self.prepare_tree(_sample_clean_crlf, eol='lf')
++        self.assertContent(wt, basis, _sample_text_on_unix,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_LF_IN_REPO)
++
++    def test_eol_crlf_clean_lf(self):
++        wt, basis = self.prepare_tree(_sample_clean_lf, eol='crlf')
++        self.assertContent(wt, basis, _sample_text_on_unix,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_LF_IN_REPO)
++
++    def test_eol_crlf_clean_crlf(self):
++        wt, basis = self.prepare_tree(_sample_clean_crlf, eol='crlf')
++        self.assertContent(wt, basis, _sample_text_on_unix,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_LF_IN_REPO)
++
++    def test_eol_native_with_crlf_in_repo_clean_lf(self):
++        wt, basis = self.prepare_tree(_sample_clean_lf,
++            eol='native-with-crlf-in-repo')
++        self.assertContent(wt, basis, _sample_text_on_win,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_CRLF_IN_REPO)
++
++    def test_eol_native_with_crlf_in_repo_clean_crlf(self):
++        wt, basis = self.prepare_tree(_sample_clean_crlf,
++            eol='native-with-crlf-in-repo')
++        self.assertContent(wt, basis, _sample_text_on_win,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_CRLF_IN_REPO)
++
++    def test_eol_lf_with_crlf_in_repo_clean_lf(self):
++        wt, basis = self.prepare_tree(_sample_clean_lf,
++            eol='lf-with-crlf-in-repo')
++        self.assertContent(wt, basis, _sample_text_on_win,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_CRLF_IN_REPO)
++
++    def test_eol_lf_with_crlf_in_repo_clean_crlf(self):
++        wt, basis = self.prepare_tree(_sample_clean_crlf,
++            eol='lf-with-crlf-in-repo')
++        self.assertContent(wt, basis, _sample_text_on_win,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_CRLF_IN_REPO)
++
++    def test_eol_crlf_with_crlf_in_repo_clean_lf(self):
++        wt, basis = self.prepare_tree(_sample_clean_lf,
++            eol='crlf-with-crlf-in-repo')
++        self.assertContent(wt, basis, _sample_text_on_win,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_CRLF_IN_REPO)
++
++    def test_eol_crlf_with_crlf_in_repo_clean_crlf(self):
++        wt, basis = self.prepare_tree(_sample_clean_crlf,
++            eol='crlf-with-crlf-in-repo')
++        self.assertContent(wt, basis, _sample_text_on_win,
++            _sample_text_on_unix, _sample_text_on_win,
++            roundtrip_to=_CRLF_IN_REPO)
 === modified file 'bzrlib/workingtree_4.py'
 --- bzrlib/workingtree_4.py	2009-06-10 03:56:49 +0000
 +++ bzrlib/workingtree_4.py	2009-06-11 04:35:14 +0000
@@ -1440,13 +1440,20 @@
                  # Note: do NOT move this logic up higher - using the basis from
                  # the accelerator tree is still desirable because that can save
                  # a minute or more of processing on large trees!
++                # The original tree may not have the same content filters
++                # applied so we can't safely build the inventory delta from
++                # the source tree.
                  if wt.supports_content_filtering():
                      accelerator_tree = None
++                    delta_from_tree = False
++                else:
++                    delta_from_tree = True
                  # delta_from_tree is safe even for DirStateRevisionTrees,
                  # because wt4.apply_inventory_delta does not mutate the input
                  # inventory entries.
                  transform.build_tree(basis, wt, accelerator_tree,
--                                     hardlink=hardlink, delta_from_tree=True)
++                                     hardlink=hardlink,
++                                     delta_from_tree=delta_from_tree)
              finally:
                  basis.unlock()
          finally: