Merge lp:~spiv/bzr/chk-canonicalizer-613698 into lp:bzr
| Status: | Merged |
|---|---|
| Approved by: | Andrew Bennetts on 2010-08-16 |
| Approved revision: | 5381 |
| Merged at revision: | 5377 |
| Proposed branch: | lp:~spiv/bzr/chk-canonicalizer-613698 |
| Merge into: | lp:bzr |
| Diff against target: |
322 lines (+199/-10) 4 files modified
bzrlib/builtins.py (+8/-2) bzrlib/reconcile.py (+26/-7) bzrlib/repofmt/groupcompress_repo.py (+125/-1) bzrlib/versionedfile.py (+40/-0) |
| To merge this branch: | bzr merge lp:~spiv/bzr/chk-canonicalizer-613698 |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| bzr-core | 2010-08-11 | Pending | |
|
Review via email:
|
|||
Commit Message
Add hidden option 'bzr reconcile --canonicalize-
Description of the Change
This adds 'bzr reconcile --canonicalize-
Because so few people will ever run this code (not many repositories seem to have been adversely affected by bug 522637), in principle I'd have been happy to have this as a plugin where its rough edges wouldn't grate. But it would have required a bit more duplication of code than I'd like (although now it's done a bit less that I was expecting, I think). So I wrote this as part of the reconcile infrastructure in the core tool, but left it as a hidden option.
I didn't integrate it into the regular reconcile code path because a) it's 2x slower than regular reconcile already without doing the other things reconcile does, and b) it's going to be so rarely used.
The use of NoDupeAddLinesD
This takes about 3 minutes to fix the broken lp:do repository on my laptop.
| John A Meinel (jameinel) wrote : | # |
| John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Bennetts wrote:
> Andrew Bennetts has proposed merging lp:~spiv/bzr/chk-canonicalizer-613698 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #613698 need reconcile fixer for bug 522637 "missing referenced chk root keys"
> https:/
>
>
> This adds 'bzr reconcile --canonicalize-
>
> Because so few people will ever run this code (not many repositories seem to have been adversely affected by bug 522637), in principle I'd have been happy to have this as a plugin where its rough edges wouldn't grate. But it would have required a bit more duplication of code than I'd like (although now it's done a bit less that I was expecting, I think). So I wrote this as part of the reconcile infrastructure in the core tool, but left it as a hidden option.
>
> I didn't integrate it into the regular reconcile code path because a) it's 2x slower than regular reconcile already without doing the other things reconcile does, and b) it's going to be so rarely used.
>
> The use of NoDupeAddLinesD
>
> This takes about 3 minutes to fix the broken lp:do repository on my laptop.
>
btw, if we wanted to make this faster...
I think we could rework it so that the initial inventory is produced
from scratch, and then all others just use apply_inventory
(Possibly special cased if it actually finds an incorrect inventory.)
The total shape of the inventory is correct (otherwise all of this
wouldn't work anyway), and the deltas are correct. As such, it would
require a lot less regeneration of pages that are already correct.
That would make it possible to run it all reconcile runs, rather than
requiring a flag.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkx
0rUAn2FouKNap31
=9oPA
-----END PGP SIGNATURE-----
| Andrew Bennetts (spiv) wrote : | # |
John A Meinel wrote:
[...]
> - - repo_reconciler = self.repo.
> + if self.canonicali
> + repo_reconciler = self.repo.
> + else:
> + repo_reconciler = self.repo.
>
> ^- Why use a new function rather than passing the parameter through?
>
> I know there is a design principle that says don't add a parameter that
> is static for callers (always True, always False for a given caller).
> This isn't quite the same, since a user is selecting it.
The reason was that I didn't want to add it to all implementations of
reconcile, which felt a bit wrong considering it only applies to one
repository type.
> Anyway, it *felt* to me more like a parameter to .reconcile()
> (especially since it is a parameter up the stack).
>
> The other thing I'm not sure about, it looks like you added:
> + @needs_write_lock
> + def reconcile_
>
> only to one repository type, which means you'd get an attribute error
> running reconcile --canonicalize-chks on another repo type. But maybe I
> misread that. (Better to just raise a BzrCommandError, I guess)
Yeah, I'll make cmd_reconcile check the format and raise that.
> +class NoDupeAddLinesD
>
> ^- Is this actually necessary. I would have sworn I had a patch earlier
> that did something exactly like this, because I saw conversions (I
> think) waste tons of space that disappeared at repack. Then again, maybe
> that was in InterDifferingS
Yep, it's necessary. 200MB vs. 5MB!
> Anyway, I think it might be reasonable to just push this as the common
> case, rather than needing a wrapper. Certainly there you might be able
> to do it better than a get_parent_map() call.
Yeah, quite possibly. This implementation does have the cost of
querying the index for 1 key every add_lines, which would be pretty bad
for performance where latency is a concern. But I'm also a bit
reluctant to make add_lines take any more parameters...
-Andrew.
| Andrew Bennetts (spiv) wrote : | # |
John A Meinel wrote:
[...]
> btw, if we wanted to make this faster...
>
> I think we could rework it so that the initial inventory is produced
> from scratch, and then all others just use apply_inventory
> (Possibly special cased if it actually finds an incorrect inventory.)
That's a good point (although it relies on apply_inventory
as reliable as CHKMap.from_dict, which was the bug in the first place,
but that seems reasonable).
I'd prefer to stop spending time on this, though :) The number of
affected repositories, as judged from the bug report, seems to be pretty
low, which makes sense given the fairly particular conditions needed to
trigger it (deleting < 5 entries from an inventory that is just over the
size that splits it from one CHK node to many). So for now I think it's
good enough to get this relatively crude repair tool in the hands of
the few users that need it. We can always come back and improve it
later if there's a reason to do that.
-Andrew.
| John A Meinel (jameinel) wrote : | # |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Bennetts wrote:
> John A Meinel wrote:
> [...]
>> btw, if we wanted to make this faster...
>>
>> I think we could rework it so that the initial inventory is produced
>> from scratch, and then all others just use apply_inventory
>> (Possibly special cased if it actually finds an incorrect inventory.)
>
> That's a good point (although it relies on apply_inventory
> as reliable as CHKMap.from_dict, which was the bug in the first place,
> but that seems reasonable).
>
> I'd prefer to stop spending time on this, though :) The number of
> affected repositories, as judged from the bug report, seems to be pretty
> low, which makes sense given the fairly particular conditions needed to
> trigger it (deleting < 5 entries from an inventory that is just over the
> size that splits it from one CHK node to many). So for now I think it's
> good enough to get this relatively crude repair tool in the hands of
> the few users that need it. We can always come back and improve it
> later if there's a reason to do that.
>
> -Andrew.
>
Actually, it is
if delete_Count > _INTERESTING ...:
so it is deleting *more* that 5 entries from an inventory that causes it
to cross the boundary.
Which just means anything that deletes a lot of entries, enough to cross
the threshold.
That said, I don't know that you need to spend a lot more time on this.
Ideally we would also have 'bzr check' notice it, though.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkx
yIUAoLfOBGoRGzb
=D6lS
-----END PGP SIGNATURE-----
- 5381. By Andrew Bennetts on 2010-08-12
-
Don't traceback if a repository doesn't support reconcile_
canonicalize_ chks.
| Andrew Bennetts (spiv) wrote : | # |
John wrote:
> That said, I don't know that you need to spend a lot more time on this.
I'm going to go ahead and merge this then. I hope I'm not taking too many liberties by doing so, but I'm confident we can incrementally improve this after landing if we need to, and it'll be of more benefit landed than not.
> Ideally we would also have 'bzr check' notice it, though.
I agree. At the moment I'm not planning to spend time on that.
| Andrew Bennetts (spiv) wrote : | # |
sent to pqm by email

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Andrew Bennetts wrote: /bugs.launchpad .net/bugs/ 613698 chks' as a hidden option to repair repositories affected by bug 522637. The basic approach is like John's bzr-check-chk plugin: regenerate the CHK map from scratch and verify that the key is the same. ecorator is not strictly necessary, but it avoids exploding a 5MB repo of lp:do into over 200MB. That is fixed by repacking, but that sort of waste might have made this repair tool impossible to run on a larger repository.
> Andrew Bennetts has proposed merging lp:~spiv/bzr/chk-canonicalizer-613698 into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> #613698 need reconcile fixer for bug 522637 "missing referenced chk root keys"
> https:/
>
>
> This adds 'bzr reconcile --canonicalize-
>
> Because so few people will ever run this code (not many repositories seem to have been adversely affected by bug 522637), in principle I'd have been happy to have this as a plugin where its rough edges wouldn't grate. But it would have required a bit more duplication of code than I'd like (although now it's done a bit less that I was expecting, I think). So I wrote this as part of the reconcile infrastructure in the core tool, but left it as a hidden option.
>
> I didn't integrate it into the regular reconcile code path because a) it's 2x slower than regular reconcile already without doing the other things reconcile does, and b) it's going to be so rarely used.
>
> The use of NoDupeAddLinesD
>
> This takes about 3 minutes to fix the broken lp:do repository on my laptop.
>
- - repo_reconciler = self.repo. reconcile( thorough= True) ze_chks: reconcile_ canonicalize_ chks() reconcile( thorough= True)
+ if self.canonicali
+ repo_reconciler = self.repo.
+ else:
+ repo_reconciler = self.repo.
^- Why use a new function rather than passing the parameter through?
I know there is a design principle that says don't add a parameter that
is static for callers (always True, always False for a given caller).
This isn't quite the same, since a user is selecting it.
Anyway, it *felt* to me more like a parameter to .reconcile()
(especially since it is a parameter up the stack).
The other thing I'm not sure about, it looks like you added: canonicalize_ chks(self) :
+ @needs_write_lock
+ def reconcile_
only to one repository type, which means you'd get an attribute error
running reconcile --canonicalize-chks on another repo type. But maybe I
misread that. (Better to just raise a BzrCommandError, I guess)
+class NoDupeAddLinesD ecorator( object) :
^- Is this actually necessary. I would have sworn I had a patch earlier erializer or something along those lines.
that did something exactly like this, because I saw conversions (I
think) waste tons of space that disappeared at repack. Then again, maybe
that was in InterDifferingS
Anyway, I think it might be reasonable to just push this as the common
case, rather than needing a wrapper. Certainly there you might be able
to do it better than a get_parent_map() call.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with...