Code review comment for lp:~ian-clatworthy/bzr/eol-st-ci-fix

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Ian Clatworthy has proposed merging lp:~ian-clatworthy/bzr/eol-st-ci-fix into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This patch fixes some very bad bugs in content filtering. There are actually 2 separate issues fixed:
>
> 1. Branching from a non-content-filtered tree to a content-filtered one would produce an incorrect dirstate because the optimisation to reuse the source dirstate wasn't being disabled
>
> 2. Status detection was relying on size matching to determine equivalence when that assumption doesn't hold in the presence of content filtering.
>
> Both code changes are quite straightforward, though the latter one requires fixes in both Python and Pyrex code. Testing all of this is far more complex though. I've added a heap more units tests and I'm comfortable that status on content filtered trees now has pretty good coverage.
>
> I'm yet to add detailed tests for the branch/commit issue, because the monkey-patching approach used to test status isn't at the right level to work for branch. I have tested it manually though and one of the bug reporters (Frits) has confirmed that the code fixes works correctly for him. Given the related code change - switching off an optional optimisation - is very straight forward, I think this patch can land as is for 1.16rc1.
>
> FWIW, my plan is to add those additional tests soon. (Some pending changes to the monkey-patching used here to fix some other bugs will make that easier, but they aren't ready to go, and this really needs to be landed for content filtering to be unbroken for users.)
>

+ # Check the sha. We can't just rely on the size as
+ # content filtering may mean differ sizes actually
+ # map to the same content
+ if link_or_sha1 is None:

^- It would be nice if this was only done when there were actually
filters in play, rather than disabling the size check universally.

This change will cause us to sha1 the content of everything we are
committing 2 times. Once during iter_changes to make sure it really has
changed, and then once during the insert-into-repository code to record
its final sha, and ensure that it really has changed.

Note that ideally we would have an iter_changes that could provide
"maybe changed" and *never* sha content. Perhaps we can address that
when we re-visit dirstate. (So I'm saying it is probably ok for now, but
it is not the way we want to leave the code in the long run.)

...

                 # Note: do NOT move this logic up higher - using the
basis from
                 # the accelerator tree is still desirable because that
can save
                 # a minute or more of processing on large trees!
+ # The original tree may not have the same content filters
+ # applied so we can't safely build the inventory delta from
+ # the source tree.
                 if wt.supports_content_filtering():
                     accelerator_tree = None
+ delta_from_tree = False
+ else:
+ delta_from_tree = True

^- As long as you are getting into 'correctness', if the
*accelerator_tree* supports content filtering you get bad results, too.
(Branching from a filtered-tree into a non-filtered tree.)

What I don't understand is why "delta_from_tree" is no longer safe.

Anyway, the proper change would be in 'cmd_branch':

        accelerator_tree, br_from = bzrdir.BzrDir.open_tree_or_branch(
            from_location)
+ if accelerator_tree is not None and
+ accelerator_tree.supports_filtering:
+ accelerator_tree = None

I don't really see why adding a blackbox test that branching between
places with different rules doesn't continue to give correct results.
Assuming rules can be given with absolute paths, etc. You would do
something like:

  def test_branch_source_is_filtered(self):
    source = self.make_branch_and_tree('source')
    rules = get_rules()
    rules[source.basedir + '/foo.txt'] = XXXX
    self.build_tree(['source/foo.txt'])
    source.add()
    source.commit()
    self.run_bzr('branch source target')
    target = workingtree.WorkingTree.open('target')
    # Even though the content has different filters, iter_changes is
    # clean
    self.assertFileEqual("unfiltered content\n", 'target/foo.txt')
    changes = wt.changes_from(wt.basis_tree())
    self.assertFalse(changes.has_changes())

And then also add a

  def test_branch_target_is_filtered(self):

So I'd like to see the new tests before it lands, but otherwise

  review approve

(Consider this "tweak")

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkovwH4ACgkQJdeBCYSNAAMFTwCfTesTQECsm77/KMxiS9FtJJUe
AXkAn3J8VBOnMqk0W4FJA+ogtPN3SYeF
=c8hE
-----END PGP SIGNATURE-----

review: Approve

« Back to merge proposal