Merge lp:~vila/bzr/374726-gc-annotate into lp:~bzr/bzr/trunk-old

Proposed by Vincent Ladeuil
Status: Merged
Merged at revision: not available
Proposed branch: lp:~vila/bzr/374726-gc-annotate
Merge into: lp:~bzr/bzr/trunk-old
Diff against target: 53 lines
To merge this branch: bzr merge lp:~vila/bzr/374726-gc-annotate
Reviewer Review Type Date Requested Status
Ian Clatworthy Abstain
Review via email: mp+6785@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

This pach fixes the most blatant regression for any gc repository. Whether the repository is packed or not doesn't matter anymore.

There is still a performance regression compared to knit repositories but far more limited (at most 2x) and related to gc different choices for deltas (leading to different reannotate intermediate calls, the same annotations being finally produced anyway).

Since we are not satisfied with annotate performance in either case, I'd like some feedback about whether it's worth spending time on trying to catch up with knit here (inverstigating with jam showed no obvious way to achieve that though) or go for implementing an annotation cache (which is out of this bug scope).

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Just a quick thought: I'd prefer we fixed the comment (or code) than add another comment highlighting that they differ. Assuming you're at the sprint with John, poke him about what the correct comment ought to be. :-)

review: Abstain
Revision history for this message
John A Meinel (jameinel) wrote :

Vincent Ladeuil wrote:
> Vincent Ladeuil has proposed merging lp:~vila/bzr/374726-gc-annotate into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> This pach fixes the most blatant regression for any gc repository. Whether the repository is packed or not doesn't matter anymore.
>
> There is still a performance regression compared to knit repositories but far more limited (at most 2x) and related to gc different choices for deltas (leading to different reannotate intermediate calls, the same annotations being finally produced anyway).
>
> Since we are not satisfied with annotate performance in either case, I'd like some feedback about whether it's worth spending time on trying to catch up with knit here (inverstigating with jam showed no obvious way to achieve that though) or go for implementing an annotation cache (which is out of this bug scope).
>
>
>

 Review approve

=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- bzrlib/repofmt/groupcompress_repo.py 2009-04-20 08:37:32 +0000
+++ bzrlib/repofmt/groupcompress_repo.py 2009-05-25 19:04:59 +0000
@@ -89,6 +89,9 @@
             index_builder_class(reference_lists=1),
             # Texts: compression and per file graph, for all fileids -
so two
             # reference lists and two elements in the key tuple.
+
+ # XXX: The comment says two reference_lists, yet the param
set it
+ # to 1
             index_builder_class(reference_lists=1, key_elements=2),
             # Signatures: Just blobs to store, no compression, no parents
             # listing.

^- The original comment is just wrong. There is no "compression" parent
listed in groupcompress formats. So there is only the per-file graph,
and thus "reference_lists=1" is correct.

John
=:->

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> "Ian" == Ian Clatworthy <email address hidden> writes:

    Ian> Review: Abstain
    Ian> Just a quick thought: I'd prefer we fixed the comment (or code)
    Ian> than add another comment highlighting that they differ. Assuming
    Ian> you're at the sprint with John, poke him about what the correct
    Ian> comment ought to be. :-)

The intent was exactly that :) And, indeed, John reviewed it (mail is on
its painful way :)

    Vincent

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bzrlib/groupcompress.py'
--- bzrlib/groupcompress.py 2009-04-22 17:18:45 +0000
+++ bzrlib/groupcompress.py 2009-05-26 13:35:09 +0000
@@ -1018,15 +1018,19 @@
1018 else:1018 else:
1019 keys = [key]1019 keys = [key]
1020 parent_map = {key:()}1020 parent_map = {key:()}
1021 # So we used Graph(self) to load the parent_map, but now that we have
1022 # it, we can just query the parent map directly, so create a new Graph
1023 # object
1024 graph = _mod_graph.Graph(_mod_graph.DictParentsProvider(parent_map))
1021 head_cache = _mod_graph.FrozenHeadsCache(graph)1025 head_cache = _mod_graph.FrozenHeadsCache(graph)
1022 parent_cache = {}1026 parent_cache = {}
1023 reannotate = annotate.reannotate1027 reannotate = annotate.reannotate
1024 for record in self.get_record_stream(keys, 'topological', True):1028 for record in self.get_record_stream(keys, 'topological', True):
1025 key = record.key1029 key = record.key
1026 chunks = osutils.chunks_to_lines(record.get_bytes_as('chunked'))1030 lines = osutils.chunks_to_lines(record.get_bytes_as('chunked'))
1027 parent_lines = [parent_cache[parent] for parent in parent_map[key]]1031 parent_lines = [parent_cache[parent] for parent in parent_map[key]]
1028 parent_cache[key] = list(1032 parent_cache[key] = list(
1029 reannotate(parent_lines, chunks, key, None, head_cache))1033 reannotate(parent_lines, lines, key, None, head_cache))
1030 return parent_cache[key]1034 return parent_cache[key]
10311035
1032 def check(self, progress_bar=None):1036 def check(self, progress_bar=None):
10331037
=== modified file 'bzrlib/knit.py'
--- bzrlib/knit.py 2009-04-29 09:50:57 +0000
+++ bzrlib/knit.py 2009-05-26 13:35:09 +0000
@@ -3407,7 +3407,7 @@
3407 fulltext.)3407 fulltext.)
34083408
3409 :return: A list of (key, index_memo) records, suitable for3409 :return: A list of (key, index_memo) records, suitable for
3410 passing to read_records_iter to start reading in the raw data fro/3410 passing to read_records_iter to start reading in the raw data from
3411 the pack file.3411 the pack file.
3412 """3412 """
3413 if key in self._annotated_lines:3413 if key in self._annotated_lines:
34143414
=== modified file 'bzrlib/repofmt/groupcompress_repo.py'
--- bzrlib/repofmt/groupcompress_repo.py 2009-04-20 08:37:32 +0000
+++ bzrlib/repofmt/groupcompress_repo.py 2009-05-26 13:35:09 +0000
@@ -87,8 +87,8 @@
87 # have a regular 2-list index giving parents and compression87 # have a regular 2-list index giving parents and compression
88 # source.88 # source.
89 index_builder_class(reference_lists=1),89 index_builder_class(reference_lists=1),
90 # Texts: compression and per file graph, for all fileids - so two90 # Texts: per file graph, for all fileids - so one reference list
91 # reference lists and two elements in the key tuple.91 # and two elements in the key tuple.
92 index_builder_class(reference_lists=1, key_elements=2),92 index_builder_class(reference_lists=1, key_elements=2),
93 # Signatures: Just blobs to store, no compression, no parents93 # Signatures: Just blobs to store, no compression, no parents
94 # listing.94 # listing.