Merge lp:~agrimm/bzr-fastimport/baseline-commit into lp:bzr-fastimport
- baseline-commit
- Merge into trunk
Status: | Merged |
---|---|
Approved by: | Jelmer Vernooij |
Approved revision: | no longer in the source branch. |
Merged at revision: | 340 |
Proposed branch: | lp:~agrimm/bzr-fastimport/baseline-commit |
Merge into: | lp:bzr-fastimport |
Diff against target: |
205 lines (+109/-3) 3 files modified
cmds.py (+16/-2) exporter.py (+19/-1) tests/test_commands.py (+74/-0) |
To merge this branch: | bzr merge lp:~agrimm/bzr-fastimport/baseline-commit |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Jelmer Vernooij (community) | Approve | ||
Review via email: mp+83025@code.launchpad.net |
Commit message
Description of the change
Adds a --baseline option to fast-export to produce an export with a truncated history (a "baseline" plus all commits in the range given by the revisionspec).
Jelmer Vernooij (jelmer) wrote : | # |
Andy Grimm (agrimm) wrote : | # |
> Hi Andy,
>
> Thanks again for working on this.
>
> Generally, we try to put commands that modify fastexport streams into "bzr
> fast-import-
> also skips
If you think that having it in fast-import-filter is useful, I can do that. Will that change the logic about how the baseline is produced? Also, was your last sentence cut off here? I didn't understand it as written (skips what?)
> + help="Export an 'abolute' baseline commit prior to"
> This should probably say "absolute".
Indeed.
> + assert(
> + previd = revobj.
> It seems a bit odd that the code uses the parent of the revision that was
> specified, rather than that revision itself.
I wrestled with that a little. If the baseline argument were to take a single revision as an argument, I would agree that that is what should be used. Instead, I required the user to specify a revisionspec, which indicates the commits which they'd like to preserve. If I did not use the parent revision here, then they'd lose the first diff they wanted to be able to see. That might be confusing, though, and I'm not set on this behavior.
Do you think that it would be better to have the baseline option take a revisionspec directly, rather than requiring -r along with this option? (If I continue to require -r, I guess I need to writing some code to enforce that requirement.)
> Can you add some tests for this new behaviour?
Sure, I haven't looked at the test suite yet, but I'll give it a try.
> Cheers,
>
> Jelmer
Jelmer Vernooij (jelmer) wrote : | # |
Am 22/11/11 16:08, schrieb Andy Grimm:
>> Hi Andy,
>>
>> Thanks again for working on this.
>>
>> Generally, we try to put commands that modify fastexport streams into "bzr
>> fast-import-
>> also skips
> If you think that having it in fast-import-filter is useful, I can do that. Will that change the logic about how the baseline is produced? Also, was your last sentence cut off here? I didn't understand it as written (skips what?)
Sorry, I meant to say that using a baseline can skip processing of a lot
of revisions. If you have a 20k revision history, and specify a baseline
of r19000 then there are a lot of revisions you don't have to export. If
this functionality is added to fast-import-filter, then you would still
export all 20k revisions but then throw 19k of them away.
In other words, I think it's fine to do this in fast-export. But I
wanted to emphasize that generally we add history-modifying commands to
fast-import-filter.
>
>> + help="Export an 'abolute' baseline commit prior to"
>> This should probably say "absolute".
> Indeed.
>
>> + assert(
>> + previd = revobj.
>> It seems a bit odd that the code uses the parent of the revision that was
>> specified, rather than that revision itself.
> I wrestled with that a little. If the baseline argument were to take a single revision as an argument, I would agree that that is what should be used. Instead, I required the user to specify a revisionspec, which indicates the commits which they'd like to preserve. If I did not use the parent revision here, then they'd lose the first diff they wanted to be able to see. That might be confusing, though, and I'm not set on this behavior.
>
> Do you think that it would be better to have the baseline option take a revisionspec directly, rather than requiring -r along with this option? (If I continue to require -r, I guess I need to writing some code to enforce that requirement.)
-r specifies a revision - not a delta between revisions, so I do think
--baseline should only include the revision that was specified and not
its left hand side parent. For deltas (changes) we usually use the -c
option.
>
>> Can you add some tests for this new behaviour?
> Sure, I haven't looked at the test suite yet, but I'll give it a try.
Cool. You know where to find me on IRC if there's anything I can help with.
Cheers,
Jelmer
Andy Grimm (agrimm) wrote : | # |
> >> + assert(
> >> + previd = revobj.
> >> It seems a bit odd that the code uses the parent of the revision that was
> >> specified, rather than that revision itself.
> > I wrestled with that a little. If the baseline argument were to take a
> single revision as an argument, I would agree that that is what should be
> used. Instead, I required the user to specify a revisionspec, which indicates
> the commits which they'd like to preserve. If I did not use the parent
> revision here, then they'd lose the first diff they wanted to be able to see.
> That might be confusing, though, and I'm not set on this behavior.
> >
> > Do you think that it would be better to have the baseline option take a
> revisionspec directly, rather than requiring -r along with this option? (If I
> continue to require -r, I guess I need to writing some code to enforce that
> requirement.)
> -r specifies a revision - not a delta between revisions, so I do think
> --baseline should only include the revision that was specified and not
> its left hand side parent. For deltas (changes) we usually use the -c
> option.
Oh, I see your point here. when someone gives the command "bzr diff -r 4..6" for example, 4 is the baseline, and the diff shown is the delta between 4 and 6, so it makes sense to make the baseline the state at the first revision given. Thanks, I'll make that change.
Andy Grimm (agrimm) wrote : | # |
> > >> + assert(
> > >> + previd = revobj.
> > >> It seems a bit odd that the code uses the parent of the revision that was
> > >> specified, rather than that revision itself.
> > > I wrestled with that a little. If the baseline argument were to take a
> > single revision as an argument, I would agree that that is what should be
> > used. Instead, I required the user to specify a revisionspec, which
> indicates
> > the commits which they'd like to preserve. If I did not use the parent
> > revision here, then they'd lose the first diff they wanted to be able to
> see.
> > That might be confusing, though, and I'm not set on this behavior.
> > >
> > > Do you think that it would be better to have the baseline option take a
> > revisionspec directly, rather than requiring -r along with this option? (If
> I
> > continue to require -r, I guess I need to writing some code to enforce that
> > requirement.)
> > -r specifies a revision - not a delta between revisions, so I do think
> > --baseline should only include the revision that was specified and not
> > its left hand side parent. For deltas (changes) we usually use the -c
> > option.
>
> Oh, I see your point here. when someone gives the command "bzr diff -r 4..6"
> for example, 4 is the baseline, and the diff shown is the delta between 4 and
> 6, so it makes sense to make the baseline the state at the first revision
> given. Thanks, I'll make that change.
Upon writing the test, I realized that my original code was using parent there because the starting revision was already being excluded, so getting the parent of the first revision in "interesting" was actually just putting back the starting revision. I've changed how this works now and added a test, which should clear up any confusion about how this behaves.
Jelmer Vernooij (jelmer) : | # |
Jelmer Vernooij (jelmer) wrote : | # |
Merged, thanks !
- 340. By Jelmer Vernooij
-
Merge support for --baseline option to 'bzr export'.
Preview Diff
1 | === modified file 'cmds.py' | |||
2 | --- cmds.py 2011-10-25 23:21:02 +0000 | |||
3 | +++ cmds.py 2011-11-22 20:40:47 +0000 | |||
4 | @@ -606,6 +606,16 @@ | |||
5 | 606 | unsupported in git with an underscore instead, specify | 606 | unsupported in git with an underscore instead, specify |
6 | 607 | --rewrite-tag-names. | 607 | --rewrite-tag-names. |
7 | 608 | 608 | ||
8 | 609 | :History truncation: | ||
9 | 610 | |||
10 | 611 | When code has been significantly refactored over time (e.g., to separate | ||
11 | 612 | proprietary code from open source code), it is sometimes convenient to | ||
12 | 613 | simply truncate the revision history at a certain point. The --baseline | ||
13 | 614 | option, to be used in conjunction with -r, emits a baseline commit | ||
14 | 615 | containing the state of the entire source tree at the first requested | ||
15 | 616 | revision. This allows a user to produce a tree identical to the original | ||
16 | 617 | without munging multiple exports. | ||
17 | 618 | |||
18 | 609 | :Examples: | 619 | :Examples: |
19 | 610 | 620 | ||
20 | 611 | To produce data destined for import into Bazaar:: | 621 | To produce data destined for import into Bazaar:: |
21 | @@ -656,12 +666,16 @@ | |||
22 | 656 | help="Replace characters invalid in git with '_'" | 666 | help="Replace characters invalid in git with '_'" |
23 | 657 | " (plain mode only).", | 667 | " (plain mode only).", |
24 | 658 | ), | 668 | ), |
25 | 669 | Option('baseline', | ||
26 | 670 | help="Export an 'absolute' baseline commit prior to" | ||
27 | 671 | "the first relative commit", | ||
28 | 672 | ), | ||
29 | 659 | ] | 673 | ] |
30 | 660 | encoding_type = 'exact' | 674 | encoding_type = 'exact' |
31 | 661 | def run(self, source, destination=None, verbose=False, | 675 | def run(self, source, destination=None, verbose=False, |
32 | 662 | git_branch="master", checkpoint=10000, marks=None, | 676 | git_branch="master", checkpoint=10000, marks=None, |
33 | 663 | import_marks=None, export_marks=None, revision=None, | 677 | import_marks=None, export_marks=None, revision=None, |
35 | 664 | plain=True, rewrite_tag_names=False): | 678 | plain=True, rewrite_tag_names=False, baseline=False): |
36 | 665 | load_fastimport() | 679 | load_fastimport() |
37 | 666 | from bzrlib.branch import Branch | 680 | from bzrlib.branch import Branch |
38 | 667 | from bzrlib.plugins.fastimport import exporter | 681 | from bzrlib.plugins.fastimport import exporter |
39 | @@ -676,7 +690,7 @@ | |||
40 | 676 | outf=outf, git_branch=git_branch, checkpoint=checkpoint, | 690 | outf=outf, git_branch=git_branch, checkpoint=checkpoint, |
41 | 677 | import_marks_file=import_marks, export_marks_file=export_marks, | 691 | import_marks_file=import_marks, export_marks_file=export_marks, |
42 | 678 | revision=revision, verbose=verbose, plain_format=plain, | 692 | revision=revision, verbose=verbose, plain_format=plain, |
44 | 679 | rewrite_tags=rewrite_tag_names) | 693 | rewrite_tags=rewrite_tag_names, baseline=baseline) |
45 | 680 | return exporter.run() | 694 | return exporter.run() |
46 | 681 | 695 | ||
47 | 682 | 696 | ||
48 | 683 | 697 | ||
49 | === modified file 'exporter.py' | |||
50 | --- exporter.py 2011-10-25 23:21:02 +0000 | |||
51 | +++ exporter.py 2011-11-22 20:40:47 +0000 | |||
52 | @@ -148,7 +148,8 @@ | |||
53 | 148 | 148 | ||
54 | 149 | def __init__(self, source, outf, git_branch=None, checkpoint=-1, | 149 | def __init__(self, source, outf, git_branch=None, checkpoint=-1, |
55 | 150 | import_marks_file=None, export_marks_file=None, revision=None, | 150 | import_marks_file=None, export_marks_file=None, revision=None, |
57 | 151 | verbose=False, plain_format=False, rewrite_tags=False): | 151 | verbose=False, plain_format=False, rewrite_tags=False, |
58 | 152 | baseline=False): | ||
59 | 152 | """Export branch data in fast import format. | 153 | """Export branch data in fast import format. |
60 | 153 | 154 | ||
61 | 154 | :param plain_format: if True, 'classic' fast-import format is | 155 | :param plain_format: if True, 'classic' fast-import format is |
62 | @@ -170,6 +171,7 @@ | |||
63 | 170 | self.excluded_revisions = set() | 171 | self.excluded_revisions = set() |
64 | 171 | self.plain_format = plain_format | 172 | self.plain_format = plain_format |
65 | 172 | self.rewrite_tags = rewrite_tags | 173 | self.rewrite_tags = rewrite_tags |
66 | 174 | self.baseline = baseline | ||
67 | 173 | self._multi_author_api_available = hasattr(bzrlib.revision.Revision, | 175 | self._multi_author_api_available = hasattr(bzrlib.revision.Revision, |
68 | 174 | 'get_apparent_authors') | 176 | 'get_apparent_authors') |
69 | 175 | self.properties_to_exclude = ['authors', 'author'] | 177 | self.properties_to_exclude = ['authors', 'author'] |
70 | @@ -213,6 +215,10 @@ | |||
71 | 213 | self.note("Calculating the revisions to exclude ...") | 215 | self.note("Calculating the revisions to exclude ...") |
72 | 214 | self.excluded_revisions = set([rev_id for rev_id, _, _, _ in | 216 | self.excluded_revisions = set([rev_id for rev_id, _, _, _ in |
73 | 215 | self.branch.iter_merge_sorted_revisions(start_rev_id)]) | 217 | self.branch.iter_merge_sorted_revisions(start_rev_id)]) |
74 | 218 | if self.baseline: | ||
75 | 219 | # needed so the first relative commit knows its parent | ||
76 | 220 | self.excluded_revisions.remove(start_rev_id) | ||
77 | 221 | view_revisions.insert(0, start_rev_id) | ||
78 | 216 | return list(view_revisions) | 222 | return list(view_revisions) |
79 | 217 | 223 | ||
80 | 218 | def run(self): | 224 | def run(self): |
81 | @@ -225,6 +231,8 @@ | |||
82 | 225 | self._commit_total) | 231 | self._commit_total) |
83 | 226 | if not self.plain_format: | 232 | if not self.plain_format: |
84 | 227 | self.emit_features() | 233 | self.emit_features() |
85 | 234 | if self.baseline: | ||
86 | 235 | self.emit_baseline(interesting.pop(0), self.git_branch) | ||
87 | 228 | for revid in interesting: | 236 | for revid in interesting: |
88 | 229 | self.emit_commit(revid, self.git_branch) | 237 | self.emit_commit(revid, self.git_branch) |
89 | 230 | if self.branch.supports_tags(): | 238 | if self.branch.supports_tags(): |
90 | @@ -302,6 +310,16 @@ | |||
91 | 302 | for feature in sorted(commands.FEATURE_NAMES): | 310 | for feature in sorted(commands.FEATURE_NAMES): |
92 | 303 | self.print_cmd(commands.FeatureCommand(feature)) | 311 | self.print_cmd(commands.FeatureCommand(feature)) |
93 | 304 | 312 | ||
94 | 313 | def emit_baseline(self, revid, git_branch): | ||
95 | 314 | # Emit a full source tree of the first commit's parent | ||
96 | 315 | git_ref = 'refs/heads/%s' % (git_branch,) | ||
97 | 316 | revobj = self.branch.repository.get_revision(revid) | ||
98 | 317 | mark = 1 | ||
99 | 318 | self.revid_to_mark[revid] = mark | ||
100 | 319 | file_cmds = self._get_filecommands(bzrlib.revision.NULL_REVISION, revid) | ||
101 | 320 | self.print_cmd(self._get_commit_command(git_ref, mark, revobj, | ||
102 | 321 | file_cmds)) | ||
103 | 322 | |||
104 | 305 | def emit_commit(self, revid, git_branch): | 323 | def emit_commit(self, revid, git_branch): |
105 | 306 | if revid in self.revid_to_mark or revid in self.excluded_revisions: | 324 | if revid in self.revid_to_mark or revid in self.excluded_revisions: |
106 | 307 | return | 325 | return |
107 | 308 | 326 | ||
108 | === modified file 'tests/test_commands.py' | |||
109 | --- tests/test_commands.py 2011-10-17 10:25:55 +0000 | |||
110 | +++ tests/test_commands.py 2011-11-22 20:40:47 +0000 | |||
111 | @@ -16,6 +16,7 @@ | |||
112 | 16 | """Test the command implementations.""" | 16 | """Test the command implementations.""" |
113 | 17 | 17 | ||
114 | 18 | import os | 18 | import os |
115 | 19 | import re | ||
116 | 19 | import tempfile | 20 | import tempfile |
117 | 20 | import gzip | 21 | import gzip |
118 | 21 | 22 | ||
119 | @@ -58,6 +59,40 @@ | |||
120 | 58 | self.assertIsNot("bla", stream.read()) | 59 | self.assertIsNot("bla", stream.read()) |
121 | 59 | 60 | ||
122 | 60 | 61 | ||
123 | 62 | fast_export_baseline_data = """commit refs/heads/master | ||
124 | 63 | mark :1 | ||
125 | 64 | committer | ||
126 | 65 | data 15 | ||
127 | 66 | add c, remove b | ||
128 | 67 | M 644 inline a | ||
129 | 68 | data 13 | ||
130 | 69 | test 1 | ||
131 | 70 | test 3 | ||
132 | 71 | M 644 inline c | ||
133 | 72 | data 6 | ||
134 | 73 | test 4 | ||
135 | 74 | commit refs/heads/master | ||
136 | 75 | mark :2 | ||
137 | 76 | committer | ||
138 | 77 | data 14 | ||
139 | 78 | modify a again | ||
140 | 79 | from :1 | ||
141 | 80 | M 644 inline a | ||
142 | 81 | data 20 | ||
143 | 82 | test 1 | ||
144 | 83 | test 3 | ||
145 | 84 | test 5 | ||
146 | 85 | commit refs/heads/master | ||
147 | 86 | mark :3 | ||
148 | 87 | committer | ||
149 | 88 | data 5 | ||
150 | 89 | add d | ||
151 | 90 | from :2 | ||
152 | 91 | M 644 inline d | ||
153 | 92 | data 6 | ||
154 | 93 | test 6 | ||
155 | 94 | """ | ||
156 | 95 | |||
157 | 61 | class TestFastExport(ExternalBase): | 96 | class TestFastExport(ExternalBase): |
158 | 62 | 97 | ||
159 | 63 | def test_empty(self): | 98 | def test_empty(self): |
160 | @@ -99,6 +134,45 @@ | |||
161 | 99 | # "bad Tag" should be exported as bad_Tag | 134 | # "bad Tag" should be exported as bad_Tag |
162 | 100 | self.assertNotEqual(-1, data.find("reset refs/tags/bad_Tag")) | 135 | self.assertNotEqual(-1, data.find("reset refs/tags/bad_Tag")) |
163 | 101 | 136 | ||
164 | 137 | def test_baseline_option(self): | ||
165 | 138 | tree = self.make_branch_and_tree("bl") | ||
166 | 139 | |||
167 | 140 | # Revision 1 | ||
168 | 141 | file('bl/a', 'w').write('test 1') | ||
169 | 142 | tree.add('a') | ||
170 | 143 | tree.commit(message='add a') | ||
171 | 144 | |||
172 | 145 | # Revision 2 | ||
173 | 146 | file('bl/b', 'w').write('test 2') | ||
174 | 147 | file('bl/a', 'a').write('\ntest 3') | ||
175 | 148 | tree.add('b') | ||
176 | 149 | tree.commit(message='add b, modify a') | ||
177 | 150 | |||
178 | 151 | # Revision 3 | ||
179 | 152 | file('bl/c', 'w').write('test 4') | ||
180 | 153 | tree.add('c') | ||
181 | 154 | tree.remove('b') | ||
182 | 155 | tree.commit(message='add c, remove b') | ||
183 | 156 | |||
184 | 157 | # Revision 4 | ||
185 | 158 | file('bl/a', 'a').write('\ntest 5') | ||
186 | 159 | tree.commit(message='modify a again') | ||
187 | 160 | |||
188 | 161 | # Revision 5 | ||
189 | 162 | file('bl/d', 'w').write('test 6') | ||
190 | 163 | tree.add('d') | ||
191 | 164 | tree.commit(message='add d') | ||
192 | 165 | |||
193 | 166 | # This exports the baseline state at Revision 3, | ||
194 | 167 | # followed by the deltas for 4 and 5 | ||
195 | 168 | data = self.run_bzr("fast-export --baseline -r 3.. bl")[0] | ||
196 | 169 | data = re.sub('committer.*', 'committer', data) | ||
197 | 170 | self.assertEquals(fast_export_baseline_data, data) | ||
198 | 171 | |||
199 | 172 | # Also confirm that --baseline with no args is identical to full export | ||
200 | 173 | data1 = self.run_bzr("fast-export --baseline bl")[0] | ||
201 | 174 | data2 = self.run_bzr("fast-export bl")[0] | ||
202 | 175 | self.assertEquals(data1, data2) | ||
203 | 102 | 176 | ||
204 | 103 | simple_fast_import_stream = """commit refs/heads/master | 177 | simple_fast_import_stream = """commit refs/heads/master |
205 | 104 | mark :1 | 178 | mark :1 |
Hi Andy,
Thanks again for working on this.
Generally, we try to put commands that modify fastexport streams into "bzr fast-import- filter" , rather than "bzr fast-export". In this case, I guess it also skips
+ help="Export an 'abolute' baseline commit prior to"
This should probably say "absolute".
+ assert( len(revobj. parent_ ids)) parent_ ids[0]
+ previd = revobj.
It seems a bit odd that the code uses the parent of the revision that was specified, rather than that revision itself.
Can you add some tests for this new behaviour?
Cheers,
Jelmer